You are on page 1of 14

Homework Title/No.

1 Course Code: CAP-559

Course Instructor: Sr.Lect. Ginnia Kakkar Course Tutor( if applicable):

Date of Allotment : Date of submission : 07-09-10

Student’s Roll number : RA3803A05 Section No.: A3803

Declaration:

I declare that this assignment is my individual work. I have not


copied from any other student’s work or from any other source except
where due acknowledgement is made explicitly in the text, nor has any
part been written for me by another person.

Student’s Sign: Balwinder Singh

Evaluator’s Comments:
___________________________________________________

Marks obtained: ____________ out of __________________


PART A
Q.1. A hospital uses an application that stores patient X-ray data in
the form of large binary objects in an Oracle database. The
application is hosted on UNIX server, and the hospital staff accesses
X-ray records through Gigabit Ethernet backbone. Storage array
provides storage to the UNIX server, which has 6 terabytes of usable
capacity. How would you design your data center. Give complete
description with specification of the components used in data
center.
Ans : Five core elements are essential for the basic functionality of a data
center:
 Application: An application is a computer program that provides the
logic for computing operations. Applications, such as an order
processing system, can be layered on a database, which in turn uses
operating system services to perform read/write operations to storage
devices.
 Database: More commonly, a database management system (DBMS)
provides a structured way to store data in logically organized tables
that are interrelated. A DBMS optimizes the storage and retrieval of
data.
 Server and operating system: A computing platform that runs
applications and databases.
 Network: A data path that facilitates communication between clients
and servers or between servers and storage.
 Storage array: A device that stores data persistently for subsequent
use.
These core elements are typically viewed and managed as separate entities,
but all the elements must work together to address data processing
requirements.
Figure below shows an example of an order processing system that involves
the five core elements of a data center and illustrates their functionality in a
business process.

A customer places an order through the AUI of the order processing


pplication software located on the client computer.
The client connects to the server over the LAN and accesses the DBMS
located on the server to update the relevant information such as the customer
name, address, payment method, products ordered, and quantity ordered.
The DBMS uses the server operating system to read and write this data to the
Database located on physical disks in the storage array.
The Storage Network provides the communication link between the server
and the storage array and transports the read or write commands between
them.
The storage array, after receiving the read or write commands from the
Server, performs the necessary operations to store the data on physical disks.
Key Requirements for Data Center Elements
Uninterrupted operation of data centers is critical to the survival and success
of a business. It is necessary to have a reliable infrastructure that ensures
data is accessible at all times. While the requirements, shown in Figure 1-6,
are applicable to all elements of the data center infrastructure, our focus here
is on storage systems.

Key characteristics of data center elements


 Availability: All data center elements should be designed to ensure
accessibility. The inability of users to access data can have a significant
negative impact on a business.
 Security: Polices, procedures, and proper integration of the data
center core elements that will prevent unauthorized access to
information must be established. In addition to the security measures
for client access, specific mechanisms must enable servers to access
only their allocated resources on storage arrays.
 Scalability: Data center operations should be able to allocate
additional processing capabilities or storage on demand, without
interrupting business operations. Business growth often requires
deploying more servers, new applications, and additional databases.
The storage solution should be able to grow with the business.
 Performance: All the core elements of the data center should be able
to provide optimal performance and service all processing requests at
high speed. The infrastructure should be able to support performance
requirements.
 Data integrity: Data integrity refers to mechanisms such as error
correction codes or parity bits which ensure that data is written to disk
exactly as it was received. Any variation in data during its retrieval
implies corruption, which may affect the operations of the organization.
 Capacity: Data center operations require adequate resources to store
and process large amounts of data efficiently. When capacity
requirements increase, the data center must be able to provide
additional capacity without interrupting availability, or, at the very
least, with minimal disruption. Capacity may be managed by
reallocation of existing resources, rather than by adding new
resources.
 Manageability: A data center should perform all operations and
activities in the most efficient manner. Manageability can be achieved
through automation and the reduction of human (manual) intervention
in common tasks.
Example : CLARiiON Systems

Q.2. For the hospital scenario in question 1, what challenges your


storage management team can face in meeting the service-level
demands of hospital staff.
Ans :
Key Challenges in Managing Information
In order to frame an effective information management policy, businesses
need to consider the following key challenges of information management:

 Exploding digital universe: The rate of information growth is


increasing exponentially. Duplication of data to ensure high availability
and repurposing has also contributed to the multifold increase of
information growth.
 Increasing dependency on information: The strategic use of
information plays an important role in determining the success of a
business and provides competitive advantages in the marketplace.
 Changing value of information: Information that is valuable today
may become less important tomorrow. The value of information often
changes over time.
Framing a policy to meet these challenges involves understanding the value
of information over its lifecycle.
Other challenges can be:
 Monitoring is the continuous collection of information and the review
of the entire data center infrastructure. The aspects of a data center
that are monitored include security, performance, accessibility, and
capacity.
 Reporting is done periodically on resource performance, capacity, and
utilization. Reporting tasks help to establish business justifications and
chargeback of costs associated with data center operations.
 Provisioning is the process of providing the hardware, software, and
other resources needed to run a data center. Provisioning activities
include capacity and resource planning.
 Capacity planning ensures that the user’s and the application’s
future needs will be addressed in the most cost-effective and
controlled manner.
 Resource planning is the process of evaluating and identifying
required resources, such as personnel, the facility (site), and the
technology. Resource planning ensures that adequate resources are
available to meet user and application requirements.

To adopt eco-friendly server, network, and storage equipment. In particular,


you need equipment that:

 Reduces power consumption and heat output


 Can be configured to reduce cooling requirements in data centers
 Occupies the smallest possible data center footprint
 Meets environmental directives regarding construction and recycling

Q.3. The marketing department of a mid size firm is expanding. New


hires are being added to the department and they are given access
to the department’s files. IT has given marketing a networked drive
on the LAN, but it keeps reaching capacity every third week. Current
capacity is 500GB (and growing) with hundreds of files. Users are
complaining about LAN response times and capacity. As IT manager,
what could you recommend to improve the situation?
Ans :

First solution is EMC Symmetrix V-Max


EMC Symmetrix VMAX provides high-end storage for the virtual data center.
Symmetrix VMAX scales up to 2 PB of usable protected capacity and
consolidates more workloads with a much smaller footprint than alternative
arrays.
Its innovative EMC Symmetrix Virtual Matrix Architecture™ seamlessly scales
performance, capacity, and connectivity on demand to meet all application
requirements. Symmetrix VMAX can be deployed with Flash Drives, Fibre
Channel, and Serial Advanced Technology Attachment (SATA) drives, with
tiering fully automated with FAST. It supports virtualized and physical servers,
including open systems, mainframe, and IBM i hosts
Features Benefits

Symmetrix FAST Automate storage tiering to lower costs and


deliver higher service levels.

Linear scale-out of Consolidate multiple arrays into a single


storage resources Symmetrix V-Max system.

Up to 2 PB usable Seamlessly scale from 48 to 2,400 drives.


capacity

1 to 8 V-Max engine Consolidate more workloads in a smaller


scaling footprint with up to eight highly available
Symmetrix VMAX engines.

Virtual logical unit Transparently move data to the right tiers


number (LUN) and redundant array of independent disks
technology (RAID) types at the right time.

Virtual provisioning Efficiently allocate, grow, and reclaim storage


with ease.

Extended distance Replicate data over extended distances and


protection achieve zero data loss protection.

Information-centric Get advanced RSA security technology—built


security in, not bolted on—to keep your data safe,
reduce risk, and improve compliance

Second Solution is EMC Symmetrix DMX-4

EMC Symmetrix DMX-4 enables you to manage and protect all of your data—
more than 1 petabyte of storage—and keep it available at all times.
Symmetrix DMX-4 provides customized Flash drives that break the
performance barriers of traditional disk technology because they are
optimized to meet high-end storage requirements. DMX-4 also delivers built-in
RSA security technology to keep your critical data safe, as well as high
availability to ensure constant data access. Best of all, the DMX-4 is energy
efficient and easy to manage
Features Benefits

Seamless scalability Choose the capacity and performance that


up to 1 PB fits your needs—from 96 to 2,400 drives.

Information-centric Get advanced RSA security technology—built


security in, not bolted on—to keep your data safe,
reduce risk, and improve compliance.

Tiered storage with Optimize service levels and reduce costs with
quality of service Flash, Fibre Channel, and Serial Advanced
(QoS) Technology Attachment (SATA) drives.

Data mobility Keep your information highly mobile by


copying and moving data quickly and easily
between storage tiers, platforms, and sites.

System Rely on EMC Symmetrix Management


management Console—an intuitive, browser-based
interface—to manage configuration and
replication tasks and monitor system
activities.

Fully automated Automate storage tiering to lower costs and


storage tiering deliver higher service levels.
(FAST)

PART B
Q.4. Choose 2 RAID products, one offered by EMC and other offered
by IBM for similar requirements and compare and contrast the
features supported.
Ans :

FEATURES IBM XIV EMC CLARiiON and Symmetrix

Virtualization The system is Capacity allocation requires awareness of system


inherently virtualized. components such as disks, metavolumes, spares,
This makes the system tiers, and more. This makes system management
very easy to manage. more complex and time consuming.

Performance All volumes are spread The system is susceptible to disk hot spots which can
over all disks. Thus, all significantly degrade performance. Frequent manual
disks handle equal tuning is often required.
workloads, eliminating
disk hot spots.
FEATURES IBM XIV EMC CLARiiON and Symmetrix

Rebuild time Revolutionary Rebuild time can take hours, increasing the
protection scheme possibility of additional disk failures that can lead to
ensures 30-minute data loss.
rebuild time (or less)
for 1TB disks,
minimizing the
likelihood of a double
disk failure.

Snapshots Up to 16,000 snapshots Limited to far fewer snapshots.


in one system; no fixed
per-volume limits.

Migration Built-in capability; No comparable capability in current system models


minimal disruption as
XIV is inserted between
hosts and the older disk
systems being
replaced.

Costs Hardware: Design Hardware: A larger number of disks and/or more


innovations deliver high costly, faster disks may be required to match XIV’s
performance using performance.
lower-cost/high Software: Management software is separately priced.
capacity disks. Optional feature costs each have base charges, per-
Software: Management terabyte charges, additional charges when disk
software is included at capacity is added to the system, and post-warranty
no extra charge. maintenance charges.
Snapshots/clones, thin
provisioning, and metro
distance mirroring are
all standard at no extra
charge.

FEATURES IBM DS8000 EMC Symmetrix

Flexible duration YES (1, 2, 3, or NO (DMX and V-Max 3-year standard hardware
warranties 4 years, warranty; most licensed software has only a 90-day
customer standard warranty covering media defects, not
choice, covers “bugs”)
hardware and
licensed
software)

Published industry- YES (selected NO


standard performance models)
benchmarks(SPC-1
and SPC-2)

Comprehensive data YES (Storage Limited


striping helps Pool Striping)
significantly reduce
performance
FEATURES IBM DS8000 EMC Symmetrix

problems

Asynchronous remote YES (sends NO (delays sending data, leading to more data loss)
copy design can data quickly,
reduce data loss reducing data
loss to as little
as 3-5
seconds)

Efficient use of cache YES NO (Only ½ of cache is “effective”; large cache


slots waste valuable cache space when handling
small, random I/Os)

Self-encrypting disks YES NO (host-based encryption that can degrade


application performance; supports only some hosts)

System is YES (DS8700) NO (DMX-4 950 cannot be upgraded to DMX-4; V-


nondisruptively Max SE cannot be upgraded to V-Max)
upgradable in-place
from smallest to
largest configuration

Better IBM System i, YES (generally NO (generally delayed support at best)


p, z synergy quicker to
market with
support for
new models,
OS releases,
and function)

FEATURES IBM EMC CLARiiON


DS4000/DS5000

Dynamic mode change (Synch to Asynch) for YES NO


remote mirroring (Global and Metro)

Multi-pathing software included in base price YES NO

Published high performance ISV and industry YES NO


standard benchmarks

Dynamic conversion of entire array groups YES NO


from any RAID level to any RAID level

Mix and match all disk capacities and types YES NO


(SATA or Fibre) within the same disk drawer

Single source for server, disk, tape, software, YES NO


services, and financing

Encrypted disk drives for data at rest with no YES NO


performance impact
FEATURES IBM EMC CLARiiON
DS4000/DS5000

8 Gbps fibre channel host interface cards YES NO

Switched disk drive architecture YES NO

Call Home Autonomic Service Calling YES NO

No additional fees for software maintenance YES NO

Q.5. Analyse the impact of random and sequential I/O in different


RAID configurations.
Ans : Sequential I/Os have such a huge performance advantage over random
I/Os that the computer industry has labored over the past few decades trying
very hard to reduce random I/Os and convert them to sequential I/Os with
such techniques as caching, transaction logging, sorting, and log-structure file
systems.

Granted, the I/O path I was working with was not a traditional hard disk. It
was a LUN presented from a SAN with a large amount of cache, and to
simplify to some extent, the LUN was a RAID 0 stripe set across 12 virtualized
drives with a rather large stripe unit size (960K). But how should I explain
why 8K random I/Os could outperform 8K sequential I/Os?

After some discussions with a storage professional, we came up with a theory


consisting of the following three key factors:

 Random I/Os were able to effectively hash I/Os across multiple drives
that make up the RAID 0 device.
 Relatively large RAID 0 stripe unit size of 960K caused 8K sequential
I/Os to cluster around the same drives. Note that it would take 120
sequential I/Os to fill a single 960K stripe.
 A base amount of cache was assigned to each drive in RAID 0. And
when random I/Os were hashed across 12 drives, the I/Os benefited
from larger amount of cache.

Summary Comparison of RAID Levels

Below you will find a table that summarizes the key quantitative attributes of
the various RAID levels for easy comparison. For the full details on any RAID
level, see its own page, accessible here. For a description of the different
characteristics, see the discussion of factors differentiating RAID levels. Also
be sure to read the notes that follow the table:

Random Sequential
RAID Number Storage Fault Random Sequential
Capacity Availability Write Write Cost
Level of Disks Efficiency Tolerance Read Perf Read Perf
Perf Perf
0 2,3,4,... S*N 100% none $

1 2 S*N/2 50% $$

varies,
2 many ~ 70-80% $$$$$
large

3 3,4,5,... S*(N-1) (N-1)/N $$

4 3,4,5,... S*(N-1) (N-1)/N $$

5 3,4,5,... S*(N-1) (N-1)/N $$

6 4,5,6,... S*(N-2) (N-2)/N $$$

7 varies varies varies $$$$$

01/10 4,6,8,... S*N/2 50% $$$

S*N0*(N3-
03/30 6,8,9,10,... (N3-1)/N3 $$$$
1)

S*N0*(N5-
05/50 6,8,9,10,... (N5-1)/N5 $$$$
1)

S*((N/2)- ((N/2)-
15/51 6,8,10,... $$$$$
1) 1)/N

Notes on the table:

 For the number of disks, the first few valid sizes are shown; you can
figure out the rest from the examples given in most cases. Minimum
size is the first number shown; maximum size is normally dictated by
the controller. RAID 01/10 and RAID 15/51 must have an even number
of drives, minimum 6. RAID 03/30 and 05/50 can only have sizes that
are a product of integers, minimum 6.
 For capacity and storage efficiency, "S" is the size of the
smallest drive in the array, and "N" is the number of drives in the
array. For the RAID 03 and 30, "N0" is the width of the RAID 0
dimension of the array, and "N3" is the width of the RAID 3 dimension.
So a 12-disk RAID 30 array made by creating three 4-disk RAID 3
arrays and then striping them would have N3=4 and N0=3. The same
applies for "N5" in the RAID 05/50 row.
 Storage efficiency assumes all drives are of identical size. If this is not
the case, the universal computation (array capacity divided by the sum
of all drive sizes) must be used.
 Performance rankings are approximations and to some extent, reflect
my personal opinions. Please don't over-emphasize a "half-star"
difference between two scores!
 Cost is relative and approximate, of course. In the real world it will
depend on many factors; the dollar signs are just intended to provide
some perspective.
Q.6. Based on average number of IOPS compare the performance
offered by RAID levels 1+0, 0+1, 3 and 5.
Ans : Every disk in your storage system has a maximum theoretical IOPS
value that is based on a formula. Disk performance — and IOPS — is based
on three key factors:
 Rotational speed (aka spindle speed). Measured in revolutions
per minute (RPM), most disks you’ll consider for enterprise storage
rotate at speeds of 7,200, 10,000 or 15,000 RPM with the latter two
being the most common. A higher rotational speed is associated with
a higher performing disk. This value is not used directly in
calculations, but it is highly important. The other three values depend
heavily on the rotational speed, so I’ve included it for completeness.
 Average latency. The time it takes for the sector of the disk being
accessed to rotate into position under a read/write head.
 Average seek time. The time (in ms) it takes for the hard drive’s
read/write head to position itself over the track being read or written.
There are both read and write seek times; take the average of the
two values.

To calculate the IOPS range, use this formula: Average IOPS: Divide 1 by
the sum of the average latency in ms and the average seek time in ms (1 /
(average latency in ms + average seek time in ms).

Sample drive:

 Model: Western Digital VelociRaptor 2.5″ SATA hard drive


 Rotational speed: 10,000 RPM
 Average latency: 3 ms (0.003 seconds)
 Average seek time: 4.2 (r)/4.7 (w) = 4.45 ms (0.0045 seconds)
 Calculated IOPS for this disk: 1/(0.003 + 0.0045) = about 133 IOPS

In RAID level 1+0 and 0+1


1 Read operation = 1 IOPS
1 Write operation = 2 IOPS

In RAID level 3
1 Read operation = 1 IOPS
1 Write operation = 4 IOPS

In RAID level 5
1 Read operation = 1 IOPS
1 Write operation = 4 IOPS

The actual disk IOPS required to provide the requested performance:


 Disk IOPS = (Read % * Required IOPS) +
(Write % * RAID5 write penalty * Required IOPS)

The chart below summarizes the read and write RAID penalties for the most
common RAID levels.

Parity-based RAID systems also introduce other additional processing that


result from the need to calculate parity information. The more parity
protection you add to a system, the more processing overhead you incur. As
you might expect, the overall imposed penalty is very dependent on the
balance between read and write workloads.

A good starting point formula is below. This formula does not use the array
IOPS value; it uses a workload IOPS value that you would derive on your own
or by using some kind of calculation tool, such as the Exchange Server
calculator.

(Total Workload IOPS * Percentage of workload that is read operations) +


(Total Workload IOPS * Percentage of workload that is read operations *
RAID IO Penalty)

The RAID 5 write penalty in a 4+1 RAID group is 4 while the RAID 10 write
penalty is 2.
Before you even put this in a spreadsheet you know what it will tell you-

 In a 100% Read Only environment RAID 5 and RAID 10 will give the
same performance. RAID 5 may use less disks to do it but not
necessarily.
 In a 100% Write Only environment, RAID 5 will require twice as many
disk IOPS and almost twice the number of disks.
 Anywhere in between those two extremes, the more writes required,
the less number of RAID 10 disks you will need to achieve the
performance.

If we stop there, it doesn’t seem like there is any point in using RAID 5 since
even in the best case scenario, there is only a partial chance that we will use
less disks. That is where the cost and space effectiveness issues come in.

 Space Effective Storage Allocation


If I want 2000 IOPS, 100% Read Only, I can do that using 15 x 146GB 15k
RPM disks in RAID 5 or in RAID 10. In RAID 5 I will get ~1.5TB net space
while in RAID 10 I will get ~1TB.

You might also like