You are on page 1of 24

1

Storage Devices

Why is storage important?

• Web 2.0 applications are an extension of your Desktop

• SaaS (Software as a service) is here and growing

• Broadband is a reality

• Storage costs are dropping

• Everyone expects near-unlimited storage online – Youtube, Flickr, Facebook et al are storing
your life online*

• (.. And yea … let’s not forget your personal bit-torrent collection)

* it would take 1400 TB to store your entire life in video. 5700 TB if you want to know what was
happening around you. Another 73 TB for the audio files of everything you heard (MP3 quality).
That’s about 6000 TB for a copy of your life

Agenda:

• Hard disks

 SATA ,SAS,FC (Fiber Channel), Solid state

• RAID

• DAS

• SAN

• NAS

SATA (Serial advanced Technology Attachment):

• It is a computer bus interface for connecting host bus adapter (connect the host to other
n/w & storage devices) to mass storage devices such as hard drives and optical drives.

• Advantage

reduce the cable-bulk and cost

Faster and more efficient data transfer


2

SAS (Serial Attached SCSI):

• It is a computer bus used to move data to and from computer storage devices such as

hard drives & tape drives

SCSI (Small Computer System Interface):

• It is a set of standards for physically connecting and transforming data between computers
and peripheral devices.

FC (Fiber Channel):

• It is a gigabit Speed n/w technology primarily used for storage networking.

primarily – supercomputer

Now - SAN (Standard connection type)

Choosing your Hard Disk

(SATA, FC, SAS, SCSI, Solidstate)

Introduction to Hard Drives:

• Basic physical storage unit (aka Physical block device)

• Variables to consider when selecting a drive

 Type (SAS, SATA, FC)

 RPM (revolutions per minute)

 Capacity

 MTBF (Mean Time between Failures)

 Life Expectancy
3

SATA SAS FC

(Serial ATA) (Serial Attached SCSI) (Fibre Channel)

Typical Use low-cost, high-volume, Replacement for SCSI High performance


low-speed, large- transaction oriented
storage environments High performance applications with high
transaction oriented IOPs requirement
applications with high
IOPs requirement
CDP / Backups

Performance Average Good (Similar to FC) Good (Similar to SAS)

Typically 7200 RPM 10k / 15k RPM 10k / 15k RPM

Hard drive Typically - 250 GB, 500 Typically – 73 GB, 146 GB, Typically – 73 GB, 146 GB,
capacities GB, 750 GB, 1TB 300 GB, 400 GB 300 GB, 400 GB

Hard Disk types:

SATA SAS FC

(Serial (Serial Attached SCSI) (Fibre


ATA) Channel)

Price per Gig $ 0.33 $2 $3

(based on max drive capacity


retail web price)

Miscellaneous - • Backward compatible with -


SATA

• Allows mixing SATA drives on


same backplane

Hard Disk Conclusions:


4

• For high IOPs, database applications, low-storage requirements – you have a choice between
FC and SAS

• SAS currently seems like the better option

• Future SAS standards promise to be faster than FC (though it is likely they may remain neck
to neck)

• For high-storage requirements (video server, file servers, photo storage, archivals, mail
servers, backup servers) SATA is the way to go

• One may combine SAS and SATA to reduce average cost and achieve your goals – especially
since the backplanes are cross-compatible

• Readup the spec sheet of the hard drives you plan on using for determining specifics

Solid State Drives:

• Uses solid state memory to store persistent data

• Eliminates mechanical parts

• Useful for creating efficient in-between caches or storing small to mid-sized high
performance databases
5

Solid State Drives:

Advantages Disadvantages

• Faster startup – no spinning • Significantly more expensive ($10-30/GB for


Flash based, $100-200/GB for DDR RAM based)
• Significantly faster on Random
IO (From 250x to 1000x+) • Slightly slower on large sequential reads

• Extremely low latency (25x to • Slower random write speeds incase of Flash
200x better) based storage

• No noise

• Lower power consumption

• Lesser heat production

RAID Primer

redundant array of inexpensive disks or redundant array of independent disks

(0, 1, 2, 3, 4, 5, 6, TP, 0+1, 10, 50, 60)

Introduction to RAID:

• allows multiple disks to appear as a single contiguous physical block device

• This uses a pool of disks to save data. Rather than spend billions building special high-
capacity disks, greater capacity is achieved by simply putting PC disks into RAIDs.

• Typically we need a RAID controller on the host

• provides redundancy / high availability

• A raid group appears as a single physical block device

• There are several types/levels of RAID

RAID 0:
6

• RAID 0 writes blocks to multiple disks without redundancy

• Because the data is being written to multiple disks the controller can work in parallel on both
read and write, improving performance

• If any error occurs data can be lost

• Don’t use on mission critical data; only for performance

• Ideally you have one drive per controller

RAID 1:

• This is mirroring. The same data is written to two disks. If either disk fails a complete copy of
the data is available at the other disk

• Uses 2X the storage space, can get better performance because the OS can pick the disk with
the least seek or rotational latency

RAID-5:
7

• RAID5 uses “parity” or redundant information. If a block fails, enough parity information is
available to recover the data

• The parity information is spread across all the disks

• High read rate, medium write rate

• A disk failure requires a rebuild as the parity information is used to re-create the data lost

RAID-10:

• RAID-10 is striping plus mirroring, so you get good performance plus a fully mirrored data, at
the expense of 2X disk

Storage:

• RAID-5 is a reasonable choice most of the time.

• There are many commodity vendors of RAID arrays

• SCSI RAID arrays are expensive, the disks are expensive, and the disks have low capacity, but
the RAID arrays have good performance

• ATA raid arrays have excellent price (1/3-1/2 that of SCSI drives) and capacity, somewhat
lower performance

• Apple ATA RAID: 7 TB, $11.5K

• Promise Vtrak 15110: $4K plus 15 400GB SATA disks at $300 = 6 TB for $8,500

Comparison of Single RAID Levels:


8

RAID 0 RAID 1 RAID 5 RAID 6

Diagram

Description Striping Mirroring Striping with Parity Striping with Dual


Parity

Minimum 2 2 3 4
Disks

Maximum Controller 2 Controller Controller


Disks Dependant Dependant Dependant

Array No. of Drives x Drive (No. of Drives - 1) x (No. of Drives - 2) x


Capacity Drive Capacity Capacity Drive Capacity Drive Capacity

RAID RAID 1 RAID 5 RAID 6


0

Storage Efficiency 100% 50% (Num of drives – 1) / (Num of drives – 2) /


Num of drives Num of drives

Fault Tolerance None 1 Drive failure 1 Drive failure 2 Drive failures

High Availability None Good Good Very Good

Degradation during NA Slight High degradation Very High degradation


rebuild degradation
Slow Rebuild Very Slow Rebuild
Rebuilds very
fast (due to write penalty (due to write penalty
of parity) of dual parity)

RAID 0 RAID 1 RAID 5 RAID 6

Random Read Very Good Very Good Very Good


Performance Good
9

Random Write Very Good (slightly worse Fair (Parity Poor (Dual Parity
Performance Good than single drive) overhead) Overhead)

Sequential Read Very Fair Good Good


Performance Good

Sequential Write Very Good Fair Fair


Performance Good

Cost Lowest High Moderate Moderate+

RAID 0 RAID 1 RAID 5 RAID 6

Use Non critical Typically used as Non-write intensive Non-write intensive


Case data RAID 10 in OLTP / OLTP applications / OLTP applications /
OLAP applications file servers etc file servers etc
High speed
requirements

Data backed up
elsewhere

Misc - - Parity can Not supported on all


considerably slow RAID cards
down system

Comparison of Nested RAID Levels:

RAID 10 RAID 50
10

Diagram

Description Mirroring then Striping Striping with Parity then Striping without parity

Minimum Even number > 4 >6


Disks

Maximum Controller Dependant Controller Dependant


Disks

Array (Size of Drive) * (Number of (Size of Drive) * (No. of Drives In Each RAID 5
Capacity Drives ) / 2 Set - 1) * (No of RAID 5 Sets)

RAID 10 RAID 50

Storage Efficiency 50% ((No. of Drives In Each RAID 5 Set - 1) / No. of


Drives In Each RAID 5 Set)

Fault Tolerance Multiple drive failure as Multiple drive failure as long as 2 drives from
long as 2 drives from same same RAID 5 set do not fail
RAID 1 set do not fail

High Availability Excellent Excellent

Degradation Minor Moderate degradation


during rebuild
Slow Rebuild

(due to write penalty of parity)

RAID 10 RAID 50

Read Performance Very Good Very Good

Write Very Good Good


Performance
11

Use Case OLTP / OLAP applications Medium-write intensive OLTP / OLAP


applications

Nested RAID Misc Notes:

• RAID 10 is faster and better than RAID 0+1 for the same cost

• RAID 60 is similar to RAID 50 except that the striped sets with parity contain dual parity

• Ideally RAID 10 and RAID 50 will be the only nested RAID levels you will use

RAID Considerations:

• Select your Stripe Size by empirical testing

 smaller stripe size increases transfer performance, decreases positioning


performance, and vice versa

 ideal stripe sizes depend on your application, typical data read in a read, sequential
vs random reads etc

• Try and select hard drives from separate production batches

• Maintain sufficient Spares in a large array (typically 1 per 10-15 disks is sufficient)

• Use Global spares across RAID groups if your controller supports it

RAID Considerations:

• Use hardware RAID unless performance is not a consideration

 Especially nested RAID levels or parity based RAID – consume more CPU cycles and
increase rebuild time if implemented in software

• Ensure the controller has battery backup to retain its cache in case of power failure

• For internal RAID Controller cards use faster PCI buses (PCI-x)

Storage Technologies :

• A secondary or tertiary storage may connect to a computer utilizing computer networks.


This concept does not pertain to the primary storage, which is shared between multiple
processors in a much lesser degree.
12

• Direct Attached Storage (DAS)-is a traditional mass storage, that does not use any network.
This is still a most popular approach. This term was created lately, together with NAS and
SAN.

• Network Attached Storage (NAS)-is mass storage attached to a computer which another
computer can access at file level over a local area network, a private wide area network, or
in the case of online file storage, over the Internet.

• Storage Area Network (SAN)-is a specialized network, that provides other computers with
storage capacity. The crucial difference between NAS and SAN is the former presents and
manages file systems to client computers, whilst the latter provides access at block-
addressing (raw) level, leaving it to attaching systems to manage data or file systems within
the provided capacity. SAN is commonly associated with Fibre Channel networks

Passive Disk Enclosure based Direct Attached Storage (PDE based DAS):

Passive Disk Enclosure based DAS:

• DAS – Direct Attached storage

• RAID controller inside host machine

• External chassis is simply a JBOD (Just a Bunch Of Disks)

 (or what I’d like to call Passive Disk Enclosure or PDE) Eg Dell Powervault MD1000

• Passive Disk Enclosure can consist of SAS, SATA or FC drives


13

• Passive Disk Enclosure to RAID Controller connectivity can be SAS, FC, SCSI (possibly different
from the backplane)

• Multiple PDEs can be daisy chained if they support it

• Array of disks can be divided into multiple RAID groups

• Array of disks can be divided into multiple heterogeneous RAID groups

• Size and type of a RAID group depends on RAID card

• PDE may have multiple paths to system with possibility of multiplexing for increased speed

• Global spares can be defined on the RAID card

• Maximum storage size = maximum number of PDEs that can be daisy chained x size of drives

• Performance Considerations

 Drives

 RAID configuration

 PDE Interconnect
14

 PDE to RAID Card connect

 RAID card config (cache etc)

 PCI bus

Active Disk Enclosure based Direct Attached Storage (ADE based DAS)

Active Disk Enclosure based DAS:

• ADE Difference -> RAID Card is not in the host machine but in the enclosure

• Host machine has a SAS/FC Host Bus Adaptor (HBA) depending on ADE to Host connectivity
support

 Some ADEs may support multiple connection protocols

• ADE may support SAS/FC/SATA drives

• ADE can support daisy-chaining PDEs

• Eg of ADE – Dell MD 3000, Infortrend eonstor devices, Nexsan Satabeast and Sataboy etc

Active Disk Enclosure based DAS:

• ADE may support dual RAID Controllers

• RAID Controllers can be used as Active-Active (incase of multiple RAID Groups) – otherwise
as Active Passive
15

• RAID Controller to HBA connectivity can be multiplexed - if supported - for higher


throughput

• ADEs are wrongly but commonly referred as SAN (SAN device would still be alright)

Partitioning and Mounting:

Logical Volumes:

• A RAID Group is a physical unit of storage

• At the Operating System a Logical Group can be created out of multiple RAID Groups

• Each Logical Group can be further divided into Logical Volumes

• Each Logical Volume represents a mountable block device

• In Linux this is done using LVM (logical volume manager for the Linux kernel; it manages disk
drives and similar mass-storage devices, in particular large ones )

• In LVM Logical Volumes are resizable

SAN (Storage Area Network):

• Multiple host machines connected to an ADE through a SAN switch


16

• SAN refers to the interconnect + Switch + ADE + PDE

• Switch and HBA can be SAS / FC depending on interconnect type supported by ADE

• ADE would support creation of Volumes

• These can be mounted onto Client and further subdivided

• Care must be taken to mount each Logical Volume onto a single client (unless you are
running a Clustered File System)

• This can be achieved by host masking supported by ADE and/or the Switch

• Without careful host masking and mounting data corruption can take place

• Complex SAN configs include multiple hosts and multiple ADEs connected to active-active
switches with multiplexed connections

• Client hosts can be of heterogeneous operating systems

• (Funnily ADE to PDE paths sometimes are not be multiplexed)

• While this looks complex – just think of it as removing hard disks from the machine and
hosting them outside in separate enclosures
17

• Each machine mounts an independent partition from the SAN

• Performance Considerations

• All variables we covered before

• Switch config

• Ensure that switch / HBA / interconnect does not become the bottleneck and full
hdd throughput can be utilized

Throughput Calculations:

• Hard disk performance – Type, RPM etc

• Data distribution and Type of Data access

• RAID performance, number of drives, RAID type

• RAID card performance – cache, active-active config etc

• ADE to switch connection speed

• Switch to HBA connection speed

• HBA to PCI bus speed

Storage Technologies:

Technology Advantages Limitations Applications


18

Low cost per megabyte

Unlimited capacity
with multiple discs

Portable
Data archiving
Widely-supported I/O Limited capacity on
Compact disc, interfaces one disc(though much Data distribution
recordable greater than diskette)
Can be formatted for Data migration
( CD-R ) or rewritable
( CD-RW ) and DVD different data formats Slow to moderate Localized file sharing
read/write speed
Long life Offsite storage
High data density

Immune to corruption
once data is written
(CD-R and DVD only)

Technology Advantages Limitations Applications


19

Limited capacity
Simple to use Local data transfer of
Limited read/write
Portable speed small files
Diskettes, 1.44 MB
Can be formatted for Not supported by Storage of small files
different data formats many newer or programs
computers

Technology Advantages Limitations Applications

Limited capacity
High read/write speed
Local backup
Hard drive, external Awkward for data
Can be moved among
transfer among Local archiving
computers
multiple computers

Technology Advantages Limitations Applications


20

Convenient; usually comes with


the computer

High read/write speed Limited capacity


Storage in a
Hard drive, Convenient for use with single Without special single computer
internal computer (but can be shared support, confined to a
among multiple computers with single computer or Swap files
proper support server

Most common form of data


storage

Technology Advantages Limitations Applications

Personal
Simplicity
computing
Portability Proprietary media
Removable Local data
storage Unlimited capacity with multiple Limited read/write transfer of small
(ZIP disks, JAZ disks speed files
disks, etc.)
Convenient for use with single High cost per megabyte Local backup
computer
Local archiving

it is the part of an operating system which is responsible for interacting directly with hardware and
does this by using your device drivers. think of the kernel as a manager managing and using the
processes between the other parts of the operating system and the hardware. it executes tasks to be
done and handles errors and access to your computer.

some other functions includes managing directly with your computer memory, allocating resources,
"communicating" directly with the cpu and other devices such as your printer and flash drives etc
and many more.

it is also responsible for "booting" up your computer, that is after your bios is processed and passes
control of computer to your bootloader and from there your kernel given the control and initiates
the rest.

the kernel is like the motherboard in your pc which holds and manage everything together.

Technology Advantages Limitations Applications


21

Swap files

Local data
No mechanical transfer
parts Internet service
Solid-state storage
(USB devices, flash High read/write Limited storage capacity providers
memory, smart cards, speed High cost per I/O operation Video processing
etc.)
Small form Relational
factor databases

High-speed data
acquisition

Technology Advantages Limitations Applications

Storage for each server must


be administered separately Data and
Simplicity
application
Inconvenient for data
Direct-attached storage Low initial cost sharing
transfer in network
(DAS)
Ease of environments Data backup
management
Server bears load of Data archiving
processing applications

Technology Advantages Limitations Applications


22

Disk-to-disk (D2D)
High speed
Not as quickly accessible as backup
Disk library High storage capacity DAS; intended for "write
Data archiving
once, read rarely" data
High data availability
Near line storage

Technology Advantages Limitations Applications

Incremental
Redundancy backups

Disk-to-disk-to- Storage
High read/write speed
tape Complexity virtualization
( D2D2T ) Unlimited capacity with
multiple tapes Offsite storage

Data archiving

Technology Advantages Limitations Applications

Large databases
Used to transmit data
Bandwidth-
between devices at
intensive
gigabit speeds
Fibre Channel applications
(See Storage Frequently used in High cost
Storage area
area network storage area networks Management complexity networks (SANs)
below) (SANs)
Offsite storage
Flexible in terms of
distance Mission-critical
applications

Technology Advantages Limitations Applications


23

Used to transmit data Applications


between devices using involving remotely
the Internet Protocol (IP) distributed
May not compare databases
iSCSI favorably with Fibre
Frequently used in
(See Storage Channel for large database Storage area
storage area networks
area network transfers networks (SANs)
(SANs)
below)
More flexible in terms of Management complexity Offsite storage
distance than Fibre Mission-critical
Channel (but not as fast) applications

Technology Advantages Limitations Applications

Data archiving

Limited-budget
Low cost per megabyte businesses
Inconvenient for quick
Portability Offsite storage
Magnetic tape recovery of individual files
Unlimited capacity with or groups of files
multiple tapes

Technology Advantages Limitations Applications

Fast file access for


multiple clients

Ease of data sharing


Network- Less convenient than Data backup
attached High storage capacity storage area network
Data archiving
storage Redundancy (SAN) for moving large
(NAS) blocks of data Redundant storage
Ease of drive mirroring

Consolidation of
resources

Technology Advantages Limitations Applications


24

High speed
Users may develop false
High storage capacity sense of security Swap files
Redundant array of High data availability Recovery from failure is Internet service
independent disks
High reliability difficult in some systems providers
(RAID)

Security High cost for optimum Redundant storage


systems
Fault tolerance

Technology Advantages Limitations Applications

Excellent for moving


large blocks of data Large databases

Exceptional High cost Bandwidth-


Storage area reliability intensive
network Lack of standardization
applications
(SAN) Wide availability
Management complexity
Mission-critical
Fault tolerance
applications
Scalability

You might also like