Vcs

Veritas Cluster Server
Learning Document for HA Concepts &

By
Enterprise Services
Wipro Infotech Delhi
Confidentiality
This document is being submitted to Adobe Pvt. Ltd.by Wipro Infotech, with the explicit understanding that the
contents would not be divulged to any third party without prior written consent from Wipro Infotech.
Wipro Confidential
Page 1 of 91
VERITAS Cluster Server

This topic provides an overview of the key concepts, features, and benefits of VERITAS Cluster Server (VCS).
An Overview of VCS
VCS is an architecture-independent, availability management solution focused on proactive management of service
groups, or application services. It is equally applicable in simple shared disk, shared nothing, or SAN configurations
of up to 32 nodes and compatible with single node, parallel, and distributed applications. Cascading and multidirectional application failover is supported, and application services can also be manually migrated to alternate
nodes for maintenance purposes. VCS provides a comprehensive availability management solution designed to
minimize both planned and unplanned downtime.
Designed with a modular and extensible architecture to make it easy to install, configure, and modify, VCS can be
used to enhance the availability of any application service with its fully automated, application-level fault detection,
isolation and recovery. All fault monitors, implemented in software, are themselves monitored and can be
automatically restarted in the event of a monitor process failure. Monitored service groups and resources can either
be restarted locally or migrated to another node and restarted. A service group may include an unlimited number of
resources. Various off-the-shelf agents are available from VERITAS to monitor specific applications such as file
services, RDBMS and enterprise resource planning, or the product can be customized to monitor any hardware
component or software-based service. An SNMP agent allows VCS to generate SNMP traps so that resource state
changes can be communicated to any SNMP-based management tool such as HP OpenView, CA Unicenter, Tivoli
TME, and others. Although applicable to any application service that requires higher availability, VCS is most often
deployed in mission-critical enterprise environments such as file serving, database, and enterprise resource planning
(ERP).
The Industry's Most Scalable Availability Management Solution

Conventional cluster products rely on inefficient, point to point fault management and heartbeat mechanisms that do
not scale well to large cluster configurations. To ensure scalability, VCS leverages a unique internode
communication mechanism, called ClusterStat that supports global atomic broadcast across a very low latency
transport. This internode communication protocol is faster, more reliable and significantly more scalable than the
protocols in any of today's existing cluster products. In addition, all fault management has been multi-threaded to
speed recovery in large configurations, and efficient multi-level fault management ensures very low overhead in
configurations which may include thousands of managed resources. VCS supports 32 nodes today, but VERITAS
expects this product to support hundreds of nodes in the future.
Other features which support very large configurations include a Cluster Registry that is based on a single
configuration file auto-replicated between all nodes, support for an unlimited number of service groups and a
scalable, Java-based management GUI. A syntax checker built into the Cluster Registry minimizes operator error
during configuration, and the registry supports dependency definitions between managed resources. During
recovery, resources may either be started in parallel to speed recovery or according to the defined dependency
hierarchy. An auto discovery capability automatically recognizes new nodes as they are added to the cluster and
replicates the registry to them. Through the use of a scrollable, Microsoft Explorer-like management interface, the
VCS Cluster Server Manager (CSM) can easily provide a comprehensive view of the status of all service groups in a
single cluster with the ability to drill down for more detailed information or to perform administrative tasks with the
click of a mouse button. It can also manage multiple clusters, if so configured, across up to 32 nodes in a SAN
configuration from a single management console. VCS' ability to scale efficiently and manageably sets it apart from
other availability management products on the market today.
As enterprises move to SAN architectures, the scalability of cluster management software will play a key role in
efficiently leveraging large, centralized disk stores. More scalable software will allow more nodes to share
centralized storage, thus optimizing the use of storage and minimizing availability management and administrative
costs. It will also provide for a much better long-term growth path, allowing more nodes and disk arrays to be added
to accommodate even very rapid business expansion over a period of years.
Wipro Confidential
Page 2 of 91
VERITAS SANPoint Foundation Suite HA

This topic provides an overview of the key concepts, features, and benefits of VERITAS SANPoint Foundation
Suite HA (SPFSHA).
Overview of SPFSHA
SPFSHA extends VERITAS File System and VERITAS Volume Manager to support shared data in a SAN
environment. Using SANPoint Foundation Suite HA, multiple servers can access shared storage and files,
transparently to the applications and concurrently with each other. SANPoint Foundation Suite HA incorporates
VERITAS Cluster Server to provide cluster failover capabilities as well as internode communications across the
servers.
Features and Benefits of SPFSHA

SANPoint Foundation Suite HA makes shared storage possible and practical for a wide variety of applications.
Failover is faster on highly available configurations if a shared file system remains running even during a
single server failure.
Web serving gains manageability and scalability by accessing a common set of files for content serving on
a site. In the event of a server failure, applications can redistribute load by reassigning network addresses.
Workflow applications with large files, such as video production and CAD, can eliminate network traffic
and data copying for improved performance and easier manageability.
A backup process running on a separate server can access shared storage directly, reducing the impact of
backups on production systems and networks.
Transparent access to shared files
Using SANPoint Foundation Suite HA, multiple servers can mount and access the same file system on shared
media. No modifications to existing applications are required.
File system integrity in a shared environment
SANPoint Foundation Suite HA ensures the integrity of the shared file system by controlling access to the
file system structure using the global lock manager. It also manages cache coherence and locking, so that
systems accessing shared file systems always see the most current information.
Faster failover for high availability environments
SANPoint Foundation Suite HA includes the robust application-level failover capabilities of VERITAS
Cluster Server. Application failover is very fast, as another server can start a failed server's application
without having to restart the file system.
Cluster-wide management of SAN data
SANPoint Foundation Suite HA simplifies the management of shared data, with clusterwide logical device
naming, volume and file system operations.
Course Overview
System availability continues to receive wide attention as many organizations grow their critical business
applications on Local Area Networks (LANs). The primary reason to address availability issues is the cost of
downtime. You can establish an annual cost of downtime for every system and measure the benefits obtained by
solving the problems that cause a system to fail. You can then select among the various available options to improve
server uptime, based upon a reasonable cost and effort as well as a reasonable return on your investment.
Wipro Confidential
Page 3 of 91
Course Objectives
The overall goal of this learning experience is to provide a basic understanding of the concepts related to HA. This
course will build the foundation on which to base more advanced courses on VERITAS HA products. During this
course you will:
Define the general concept of high availability.
Identify HA storage management solutions at the disk level, such as hardware Redundant Array of
Independent Disks (RAID) and volume management software.
Describe the concept of clustering and investigate common clustering configurations.
Identify HA methods at the network level, such as redundant network connections and redundant networks.
Describe VERITAS HA products.
Lessons
Defining High Availability
What is High Availability?
Describe the concept of high availability.
The Need for High Availability
Identify the need for increased data availability in today's computer environments.
Types of Faults and Failures
Identify different types of faults and failures that can occur.
High Availability vs. Disaster Planning
Differentiate between the goals and functions of high availability and disaster planning.
High Availability vs. Fault Tolerance
Differentiate between the goals and functions of high availability and fault tolerant availability methods.
High Availability Planning
Identify guidelines to consider when planning a high availability solution.
The Layered Approach to Availability
Describe the concept of the layered availability approach.
Online Storage Management
General RAID Levels
Describe the various RAID levels.
Software RAID vs. Hardware RAID
Identify the advantages and disadvantages of hardware and software RAID.
Defining a Volume
Describe volumes and identify the advantages of using them.
VERITAS Volume Management: Virtual Objects
Describe the relationships between the virtual objects in VERITAS Volume Manager.
VERITAS Volume Management: Volume Layouts
Identify the volume layouts that are available in VERITAS Volume Manager.
VERITAS Volume Management: Hot Relocation
Describe the hot relocation process.
High Availability Clustering
Fault Resilient Clustering Concepts
Describe the general characteristics of fault resilient HA solutions.
Asymmetric 1 to 1 Configurations
Describe an asymmetric 1 to 1 configuration.
Wipro Confidential
Page 4 of 91

Symmetric 1 to 1 Configurations
Describe a symmetric 1 to 1 configuration.
N to 1 Clustering
Describe a traditional N to 1 networked cluster configuration.
N to 1 SAN Clustering
Describe clustering techniques in a Storage Area Network environment.
Failover Granularity in Clusters
Describe how resource and service groups enable application-level failover.
Highly Available Networks
Networking Overview
Describe general network components, concepts, and common topologies.
Public Network Failures
Describe failures that may affect the public service network.
Heartbeat Network Failures
Describe challenges to maintaining proper heartbeat communication between nodes in a cluster.
Redundant Networks and Network Connections
Describe how to configure redundant networks and network connections.
VERITAS Comprehensive Availability Solutions
VERITAS Comprehensive Availability
Identify the role VERITAS software components play in an overall high availability solution.
VERITAS Volume Manager
Provide an overview of the key concepts, features, and benefits of VERITAS Volume Manager.
VERITAS Storage Replicator
Provide an overview of the key concepts, features, and benefits of VERITAS Storage Replicator.
VERITAS NetBackup
Provide an overview of the key concepts, features, and benefits of VERITAS NetBackup.
VERITAS Cluster Server
Provide an overview of the key concepts, features, and benefits of VERITAS Cluster Server.
VERITAS SANPoint Foundation Suite HA
Provide an overview of the key concepts, features, and benefits of VERITAS SANPoint Foundation Suite HA.
What is High Availability?

You design a system, utilizing software and hardware components and implementing appropriate procedures, to
satisfy the basic functional requirements of your organization. This system functions properly assuming that no
faults or failures occur. However, whenever a fault or failure occurs that requires some type of maintenance
operation, an outage is observed by your users. An HA solution enables you to design, implement, and deploy
software and hardware components to satisfy your functional requirements and features sufficient redundancy to
mask faults and failures from your users. This topic describes the general concept of HA solutions.
Defining HA
HA is defined as the ability of a system to perform its function without interruption for an extended period of time.
HA can be accomplished through special HA software and the implementation of redundant system and network
hardware components. In a properly designed HA system, all of the possible failure modes for critical applications,
network connections, and data storage have been identified and the recovery times have been analyzed. Therefore,
you can determine how long the system will be down for any given failure. You can scale an HA system to an
Wipro Confidential
Page 5 of 91
appropriate level so that in the event of a fault or failure, the system can recover to a known, consistent state in an
acceptable period of time.
Availability Statistics
System availability is expressed as a measure of the period of time that the system is functioning normally. This
involves the determination of the various component failures to factor into the overall rate of system failure. It is
important to note that there is a distinction between component failure statistics and system failure statistics. The
basic availability equation is used to determine the availability of a specific system component:
Availability = MTBF/ (MTBF + MTTR)
Where MTBF is the mean time between failures and MTTR is the mean time to repair.
MTBF
MTBF = Total actual operating time/Total number of failures
The MTBF is an expected future performance based on the past performance of a system component. If the
component is new, there is no historical data to base the MTBF upon. When determining the MTBF of new
hardware components, you should obtain these statistics from the particular vendor. However, these statistics may be
inflated or have been calculated using a high standard deviation.
MTTR
The MTTR is an average amount of time that it takes to repair a component, based upon actual statistical data. When
calculating the MTTR, you can consider only the amount of on-site time that it takes to recover the component from
the time when it failed. You can also calculate the MTTR including factors such as unavailability, response time, and
travel time, in addition to on-site repair time. Many aspects of MTTRs are out of your control. For example, you
may need to replace a specific part of a server. If this part is not currently in your stock, you will have to purchase
the replacement component from the vendor or some other source and rely solely upon their ability to deliver the
part in a short amount of time.
System Availability
As stated earlier, to calculate the availability of a system, you must take into account the availability of the
individual system components such as servers, disks, I/O cards, etc.
The more hardware the system features, the more likely the system will fail. It is here that the effect of having many
of a single type of component affects the availability of a system.
For example, suppose a new disk has a quoted manufacturer's MTBF of 600,000 hours, which indicates that a disk
would be expected to fail once in about 70 years. This MTBF is calculated rather than based on actual failures. In
addition, this MTBF value considers only the disk mechanism itself. If you factor in the power supply, controller,
and fans, the MTBF becomes about 150,000 hours or about 17 years. If your system utilizes 500 disks, the failure
rates are multiplicative and the MTBF for 500 disks is 150,000 hours divided by 500, or only 300 hours. This means
that the system would fail about 30 times a year due to disk failure. The best way to reduce the frequency and
duration of failures that affect the system is to employ a properly designed HA solution.
The Rule of the Nines

Availability is often measured by the "rule of the nines".
Percentage Uptime Percentage Downtime
Downtime Per Year
98%
2%
7.3 days
99%
1%
3.65 days
(2 nines)
99.8%
0.2%
17 hours, 13 minutes
99.9%
0.1%
8 hours, 45 minutes
(3 nines)
99.99%
0.01%
52.5 minutes
(4 nines)
Wipro Confidential
Downtime Per Week

3 hours, 22 minutes
1 hour, 41 minutes
20 minutes, 10 seconds
10 minutes, 5 seconds
1 minute
Page 6 of 91
99.999%
0.001%
5.25 minutes
6 seconds
(5 nines)
99.9999%
0.0001%
31.5 seconds
0.6 seconds
(6 nines)
For most environments, 99% availability is adequate. This level of availability results in less than 2 hours of
downtime a week. It is important to consider when this downtime is taking place. For example, if a typical business
system is down on a Sunday between 3 A.M. and 4:30 A.M., this is more acceptable than if the system is down
during a Tuesday afternoon between 2 P.M. and 3:30 P.M. It is also important to consider when 100% availability is
required. For example, suppose that a brokerage house performs all stock transactions between 9 A.M. and 4 P.M on
weekdays. If the system is designed for 99% availability, it is crucial that you ensure that no system downtime
occurs during the most critical business hours.
HA Requirements
There is a trade-off in costs and benefits for various degrees of availability. When designing a system with HA
requirements, the initial requirements often include:
System availability at all times with no perceived loss of service
No loss of data at any time
Maintenance and upgrade activities do not interfere with operational service
Without being properly informed of the total costs and consequences of implementing a system that satisfies these
requirements, it is natural to want an HA solution to satisfy these lofty goals. 100% data availability is an ideal
concept, but the implementation of this solution results in very high monetary, performance, and complexity costs.
As you move from lower to higher degrees of availability, the costs can increase dramatically. In most environments,
a step from one level to the next (for example, from 99% to 99.9%), increases costs 5 to 10 times.
It is ultimately the responsibility of an HA system designer to determine:
The degree of availability that is actually required by the users, as opposed to what they might like to have
The technological alternatives that can be used to meet these requirements
All the costs
Not only monetary, but also performance degradation and system complexity.
The High Availability Equation

One way to look at high availability is to view it as a simple equation. The effectiveness of any HA system must
include reducing the time required to recover from a fault and simplifying management of the system to help enable
you to scale and grow your system.
Time to Recovery
Most enterprise environments feature a wide range of systems ranging from on-line e-commerce systems to lesscritical human resources (HR) systems. It is important to analyze the required recovery times of the various systems
in your enterprise by performing a business impact analysis. Currently, there is a lot of work being done in this area
by organizations in the analyst community such as the Gartner Group, Matter Group, and Intelligent Directions
Consulting (IDC) among others. Typically, you can break the systems in an enterprise down into five basic levels
based upon the time to recovery requirements:
Safety critical
Mission critical
Wipro Confidential
Page 7 of 91

Business critical
Task critical
Task known critical
Examples of safety critical applications include systems that manage a nuclear reactor or maintain patients'
heartbeats at a hospital. The other end of the spectrum is task known critical systems such as an HR system, that can
probably withstand an extended outage without significant impact on the overall enterprise.
Levels of Availability
It may be acceptable for a task known critical system to have a recovery time in terms of days or tens of hours. For
these systems, basic availability, such as a traditional offline tape backup, is sufficient. If you lose your HR system,
you can simply recover it from a secondary copy of the data from tape and bring the system back online in a number
of hours. If the recovery process takes a day or two, the downtime will not significantly impact users.
For business and mission critical systems, you should use a different availability approach. For example, rather than
restoring from an offline copy, you can recover from an online copy of the data. You can utilize technology such as
replication, snapshots, and mirroring to reduce the time to recovery to tens of minutes up to a couple of hours. For
even more critical systems, you can reduce the recovery time to minutes or seconds by using clustering.
There is a wide range of data availability possible. However, this range can be divided into four common levels of
availability:
Basic Availability
A basic availability environment requires no specific planning for downtime. Backups

might be taken to protect data, but the time required to restore the data can be quite
extensive in this environment. Basic availability can be adequate for many applications,
but if downtime causes any significant costs, you should consider a higher level of
availability. Task known systems would probably feature a basic availability solution.
Increased Availability
This level of availability is achieved by employing RAID (redundant array of

independent disks) technology to provide online data protection in addition to the
advantages of basic availability. RAID is an array of disks in which redundant data is
stored in different places on multiple disks. RAID technology is described in detail in the
"RAID Basics" section of this course. A task critical system might employ an increased
availability solution.
Wipro Confidential
Page 8 of 91

High Availability
In an HA architecture, hardware and software failures may occur. However, the intent is to mask the failure from the
user and to reduce the time needed to recover from that failure down to several minutes or less. It is important to
note that HA solutions are not fault tolerant. It is possible for all of the systems in an HA configuration to fail. The
goal of an HA strategy is to recover as soon as possible from a system failure, rather than ensuring that a failure
never occurs. In a simple example of an HA system, two independent servers are logically connected to form a
cluster. One server stores a copy of every component of the other system. If a failure occurs on the primary server,
files or services can be transferred from the secondary server. In addition to masking the failure, HA methods enable
you to significantly reduce recovery times to a matter of minutes in the event of a major system failure. Typically,
business critical and mission critical systems would need to use an HA solution.
Continous Availability
The most advanced level of the availability is continuous availability (CA). CA is defined as an environment
explicitly designed to eliminate all computer downtime, both unplanned and planned. Today, CA environments
approach 99.999% availability, or less than 5 minutes of downtime per year. However, it is important to note that the
costs for CA systems can range into the millions of dollars. Examples of industries that most often utilize continuous
availability solutions include air-traffic control and stock-floor trading systems
Advanced CA architectures usually feature proprietary, large, hardware-based fault tolerant host machines. In a fault
tolerant system, hardware is designed to perform self-checking diagnostics and all of the main hardware components
are physically duplicated. Self-checking resides on each major hardware component and detects and isolates failures
instantly. This ensures that erroneous data cannot corrupt other system areas. In fact, some diagnostics built into
Wipro Confidential
Page 9 of 91
specific CA architectures often automatically detect problems before they lead to failures, and initiate service
instantaneously should a component fail. Component duplication enables normal processing to continue even in the
event of a hardware failure, with no performance degradation. Safety critical systems would require a CA solution.
Simplified Management
In a typical data center environment, you may have a number of servers that have different operating systems:
Solaris, HP, Windows NT, and Windows 2000. The system might feature a number of network connections as well,
such as traditional Ethernet or SCSI connections, fibre-type connections, or storage area networking (SAN). There
are also various types of storage devices in the system.
Today's enterprise is a very heterogeneous environment. In addition, almost every environment is growing at
tremendous speeds. This requires more disk storage, different types of storage, more systems, applications,
networks, etc. How do you manage all of this? The second part of the high availability equation is simplifying
management.
Today's enterprise is a very heterogeneous environment. In addition, almost every environment is growing at
tremendous speeds. This requires more disk storage, different types of storage, more systems, applications,
networks, etc. How do you manage all of this? The second part of the high availability equation is simplifying
management.
Wipro Confidential
Page 10 of 91
It is important for the enterprise to feature an infrastructure that enables scalability required by future demands. In
addition, you need to implement a solution that enables you to perform automated tasks, virtualization, and
consolidation across all systems in the enterprise, no matter the platform or operating system.
The Need For High Availability

Historically, only a select number of applications were considered critical enough to require an HA solution. In the
past several years, the cost of systems has been significantly reduced and many new technologies have emerged in
the business landscape, such as fibre-based Storage Area Networks (SANs). Modern applications have improved
user productivity and increased the speed of business transactions. Modern businesses are much more dependent on
the availability of their computer systems. This topic identifies the need for increased data availability in today's
computer environments.
To make a business successful, employees and customers need to have access to their data and the services to work
with that data. In today's E-commerce environment, customer expectations require round-the-clock data availability.
The maintenance of corporate data and access to the data is a business necessity. Critical applications and services
include:
Database servers
File servers and filesystems
Web servers
Enterprise Resource Planning (ERP)
Application servers
There are many different reasons for implementing an HA solution. Typically, there are two situations that an HA
solution is designed to address:
The system crashes due to an unforeseen fault or failure.
The system is brought down intentionally for system maintenance and upgrade.
Originally, it was the utility companies that led the way toward more available systems and applications. Now,
global business and E-commerce are having a significant impact on the definition of acceptable system availability.
Wipro Confidential
Page 11 of 91
Data must be available round-the-clock. Regular business hours do not exist in our contemporary global
marketplace. For example, an Internet service organization must account for customers arriving at their site at any
hour of any day.
In addition, most modern organizations depend on networking technologies. More and more business-critical data is
available through networks. Access to corporate information and shared knowledge has significantly improved
productivity and communication. However, this reliance on network solutions has also helped to create a need for an
HA solution to ensure that the network is resilient to failures.
These new requirements are creating greater demands on the corporate information technology (IT) infrastructure. In
the past, it was acceptable to expect 99% system availability. This would equate to about 3.5 days of downtime per
year. However, the growth of E-commerce, greater demands for customer service, an increased dependence on
network solutions, and a competitive global market have contributed to a need for high availability. When you
consider the new costs of downtime, 99% system availability is no longer acceptable.
The Costs of Downtime

Before you can analyze the costs of an HA solution, you should consider the cost of not implementing such a
solution. For example, in the highly competitive world of Web-based brokerage houses, one hour of downtime can
cost a firm an estimated $6.5 million an hour. Gartner Group and Dataquest studies indicate that in 2000, downtime
cost United States firms over $4.6 billion.
Industry
Business Operation
Average Downtime Cost Per Hour
Financial
Brokerage operations
$6.45M
Financial
Credit card/sales authorization
$2.6M
Media
Pay per view TV
$150K
Retail
Home shopping (TV)
$113K
Retail
Home catalog sales
$90K
Transportation
Airline reservations
$89.5K
Media
Telephone ticket sales
$69K
Transportation
Package shipping
$28K
Financial
ATM fees
14.5K
These numbers only represent direct monetary losses. They don't include less obvious losses, such as lost
opportunities or customers moving their business to a competitor. Downtime can adversely affect your corporate
image in the industry as well. Competitors may discover this loss and spread the news through the corporate
community. Today, many companies find themselves relying on their systems to provide data continually to
facilitate employee productivity, improve corporate image, and better serve their customers.
Types of Faults and Failures

Before learning about HA solutions that can be used to recover from a fault or failure, it is useful to explore faults
and failures in more detail. This topic identifies different types of faults and failures that can occur. There can be a
distinction made between faults and failures. Faults are often defined as non-compliances within the system which
may or may not be externally visible to the end user. Whereas, failures can be defined as those faults which are
externally visible. Within this course the terms fault and failure are used interchangeably.
Wipro Confidential
Page 12 of 91
Defining a Failure
A failure is a deviation from the expected behavior of the system. In other words, if the system is specified to
exhibit a certain functionality, and in the process of execution the system produces a discernibly different
functionality, a failure has occurred. Functionality is typically delivered from the system by running a procedure to
execute the logic contained in software that runs in a hardware environment containing client and server machines,
networks, data storage, and other peripherals. Failures can occur in any of these software procedures or the hardware
in a system.
Failures can be classified as either:
Reproducible
A prescribed set of actions leads to the observance of the failure in a predictable manner.
Hard reproducible failures occur identically on every execution with the same input.
Soft reproducible failures might occur with a certain probability on identical executions.
Nonreproducible
The appearance of the failure is random, or is linked to a root cause outside of the environment for which
the system was engineered.
HA solutions are useful in dealing with soft reproducible and non-reproducible failures, but less effective with hard
reproducible failures.
Types of Possible Nonreproducible Failures

There are several different types of failures:
Physical Hardware Failure
Although the industry has come a long way in increasing the MTBF rates for individual hardware packaging and
mechanical components, hardware is still vulnerable to faults. Hardware failures are typically non-reproducible. For
example, a hard drive crashes or a tape library breaks. The most common examples of hardware failures include:
System memory or CPU failures
Some contemporary computer systems have the ability to reconfigure a failed component without
requiring a reboot of the system. This capability helps increase data availability in the event of CPU or
memory failures.
Backplane failure
Wipro Confidential
Page 13 of 91
Backplanes, or motherboards, are the large circuit boards that contain sockets for expansion cards and
provide the general pathway for all data in a computer system. These components rarely fail, but they can
fail in some circumstances.
In addition to the expansion sockets, active backplanes also contain logical circuitry that performs CPU
operations.
Passive backplanes contain almost no computing circuitry. Usually, the CPU is inserted on an additional
card in the passive backplane. Passive backplanes enable you to repair failed components or upgrade to
new components easily.
Disk failure
Backplanes, or motherboards, are the large circuit boards that contain sockets for expansion cards and
provide the general pathway for all data in a computer system. These components rarely fail, but they can
fail in some circumstances.
In addition to the expansion sockets, active backplanes also contain logical circuitry that performs CPU
operations.
Passive backplanes contain almost no computing circuitry. Usually, the CPU is inserted on an additional
card in the passive backplane. Passive backplanes enable you to repair failed components or upgrade to
new components easily.
Disk failure
Wipro Confidential
Page 14 of 91
Disks are very prone to failures because of the high rotation speed, low tolerances, and
possible problems with the controller boards or cables.
Tape device failure
Tape devices have similar characteristics to disks, such as high speeds and low
tolerances, and are also failure-prone. In addition, tape devices are repeatedly stopping
and starting. These actions may strain or overheat the motor and lead to motor failure.
Fan failure
Fans can also fail. If the cooling system fails, the effects may not be immediately visible, but over time
excessive heat can cause a system to act unpredictably or fail at an undesirable point in the future.
Power supply failure
Wipro Confidential
Page 15 of 91
Power supplies often have the worst MTBF of all components in a system. They can fail instantly or over time. The
gradual failure of a power supply can cause intermittent failures or unpredictable behavior in other components.
Failures in power supplies are caused by excessive switching, varying voltage levels, or other stress-inducing
factors.
Network Interface Card (NIC) failure
NICs are expansion boards inserted into a computer so the computer can be connected to
a network. If a NIC fails, network connectivity is lost. It may be difficult to detect a NIC
failure. A simple method used to detect these failures is to initiate some network traffic,
and then use a command to display the packet count. If the packet count does not
increase, it is likely that the NIC has failed. Redundant NICs should be used to avoid any
loss of network connectivity due to the failure of a single NIC.
Environmental Failures
Failures can not only be caused by internal system components, but also by environmental forces beyond your
control. Such environmental failures include:
Power fluctuations or outages
The most common external source of system failures is power outages. Things to consider in determining
the probability of power outages should include, but not be limited to, the history of local utility companies
providing uninterrupted service, the history of brownouts due to high temperatures in your area, and your
proximity to major power sources.
Cooling system failure
The environmental cooling system can fail. This would cause massive overheating of some of your crucial
system components. You should analyze your facilities' environmental control system for the likelihood of
failure.
Structural failure
Wipro Confidential
Page 16 of 91
Structural failures can range from the complete collapse of the building's support structure, to the structural
failure of a single computer rack or cabinet.
Natural disasters
Natural disasters are occurrences such as fires, floods, earthquakes, typhoons, or hurricanes. Considerations
when identifying your organization's susceptibility to natural disasters can include geographic location, the
topography of the land, or the history of natural disasters in the local area.
Human Error or Acts of Terrorism

The causes of system failures are not limited to natural causes, failures can be caused by human error or acts of
terrorism.
Human error
A failure can result from an operator or administrator issuing an inappropriate command or an individual
disrupting the system by accidentally tripping over a cable or unplugging a power supply.
Acts of terrorism
Unfortunately, in the contemporary computing world, there are many examples of terrorism: sabotage,
vandalism, arson, robbery, vehicle crashes, hazardous waste, civil disorder, war, or malicious computer
crimes. Human threats are difficult to identify, but you might consider such things as your proximity to
major highways that might transport combustible or otherwise hazardous materials, the implementation of
virus protection software, or the degree of employee's accessibility to the your computer facilities.
Network Failures
Networks are susceptible to failures in every component within the network. Network failures include physical
failures that can take place in many network-specific hardware components, such as switches, network cables, or
NICs (network interface cards). In addition to these physical components, networks feature complex configurations
and service information that, if misconfigured or not authenticated properly, can lead to countless failures that can
bring down a network.
Database Failures
A database system typically features a large source of data and many sub applications to use to extract specific
information from this data based on specified conditions. Failures can occur at any level in a database system,
ranging from a catastrophic failure of the main database engine to a temporary hang in the client-side application.
Web servers often interact with back-end database servers and therefore all the possible failures that can occur in a
database application can adversely affect a Web server as well.
High Availability vs. Disaster Planning

This topic differentiates between the goals and functions of high availability and disaster planning.
Disaster Recovery
The ability to recover from a natural disaster, such as fire, flood, or earthquake, in a short time is called disaster
recovery. The results of these disasters include physical damage to systems, and loss of data, telecommunication,
power and work space. Recovery time might be as short as minutes or hours, or as long as days or weeks.
Frequently, recovery time is directly related to how quickly a system can be accessed, the data and applications
loaded, and telecommunications restored. Redundancy is usually provided by a duplicate system at a different,
geographically remote site.
The need for disaster recovery solutions and services is increasing rapidly. The costs after a disaster become quite
large, and the need to restore access to systems and applications becomes very important. Two important issues
associated with disaster recovery are the replication of data and the currency of the data. The replication of data to
an alternate site is affected by distance and speed of the links. The slower the replication method, the more data will
Wipro Confidential
Page 17 of 91
be lost in case of disaster. The impact of a disaster on the organization must be assessed along with the cost of
providing for disaster recovery.
Comparing Disaster Recovery and HA

Disaster Planning
High Availability
Critical
Urgent
Testing and planning for a theoretical event
Optimizing and responding to current events
Offsite/offline data storage
Everything online
Notification of loss by event; possible advance notification
Notification of loss by users
HA involves system optimization and the ability to respond to current events. HA is an urgent requirement for most
organizations. Disaster planning and recovery should be a critical concern to most organizations. Disaster planning
requires continual testing and refinement. Opportunities to conduct real world drills are scarce or non-existent. In
most cases, testing your disaster recovery plan is more of a theoretical exercise than a real world experience. HA is
much more of a day-to-day operation and therefore, more organizations neglect disaster planning in favor of HA. A
properly developed disaster recovery plan should involve offsite and offline storage for recent copies of your data.
HA strives to keep all data available at all times. In the event of a disaster, you will often know about the loss of data
before your users will. In an HA environment, users will often inform you in the event of the loss of data or any
system downtime.
HA and DR Together
When defining a disaster recovery plan, your top priority is your mission critical applications. Mission critical
applications are required to be available at all times. While backup and recovery technology ensures data protection,
recovery methods are often not fast enough to handle the recovery of data used by mission critical applications. HA
methods such as replication and clustering can help to ensure immediate recovery whenever a disaster strikes.
This example illustrates a plan that addresses HA and disaster planning. By implementing a configuration with
cluster management and replication concerns, you can effectively maintain and protect your end-users and
information. You can manage clusters and move applications running at a primary site to a secondary site, while
maintaining access to critical information through the continuous replication of data between sites. Clustering and
replication are covered in more detail later in this course.
Wipro Confidential
Page 18 of 91
Disaster Planning Using Storage Level Data Replication

Storage level data replication is a popular choice for disaster planning. However, replication solutions must ensure
that your data is replicated with full integrity. Replicated data should be consistent, up-to date, and ready to use at a
moment's notice, while also being transparent to the application. The replication of data should also be seamless,
such that the application data can be sent from one primary site to multiple secondary sites for greater protection.
Replication products should not rely on any dedicated networks or vendor specific storage hardware platforms, in
order to offer better protection against a single point of failure and offer greater flexibility for change and growth.
Clustering with replication offers mission critical applications the optimal mechanism for immediate recovery. If
both the machine and the disks fail, recovery can occur in such a way that the application can fail over to another
machine in the cluster using the replicated data.
High Availability vs. Fault Tolerance

This topic differentiates between the goals and functions of high availability and fault tolerant availability methods.
A system described as fault tolerant contains multiple hardware components that function concurrently, replicating
all of the I/O. This type of system protects against hardware failures by incorporating redundant hardware
components in a single system. Fault tolerant systems cost as much as ten times more than non-fault tolerant, but
highly available solutions.
Defining Fault Tolerance

Fault tolerance extends the definition of high availability. This term is used for systems that can tolerate nearly any
type of possible fault without going down. This is a solution used by industries like power companies and telephone
companies.
Fault tolerance guarantees 99.9999% availability, or approximately 30 seconds of downtime per year.
Fault tolerant systems are very expensive because of the way they are designed.
They include complete hardware redundancy with no single point of failure from a hardware perspective. The only
situations that can cause downtime in a fault tolerant system is software or application failure, or a catastrophic
environmental disaster. Hardware redundancy is necessary, but not sufficient for fault tolerance.
A fault tolerant system must also feature some sort of redundancy management. For example, a system may provide
redundant hardware components to ensure that at least one result is correct in the presence of a fault. If a user must
somehow examine the results and select the correct one, then the only fault tolerance is performed by the user.
However, if the system selects the correct redundant result for the user, then the system is not only redundant, but
also fault tolerant.
Fault tolerant systems cannot run on typical configurations because their specialized applications must communicate
directly with the hardware, sometimes for each transaction.
Although 99.9999% availability is appealing, the costs of this solution are well beyond the affordability of most
companies.
Characteristics of a Fault Tolerant System

It is important to note that fault tolerant solutions and HA solutions are two very different concepts. A fault tolerant
system:
Is not impacted in the event of a fault or failure.
Features no loss of access.
Enables immediate and transparent recovery.
Includes replacement, or spare, hardware components that are on-line and running in sync with the primary
system.
Is expensive.
Is limited in scalability.
Wipro Confidential
Page 19 of 91
Does not use off-the-shelf hardware and software. The hardware usually has very specific software hooks,
and applications need to be written to a specific API of the operating system.
Requires a specially modified operating environment.
Features inherent redundancy management.
Fault Tolerance Processes

Fault tolerance involves the following actions in the event of a fault or failure:
Detection
The system determines that a fault or failure has occurred.
Diagnosis
The system identifies the precise subsystem or system component that failed, and determines the immediate
cause of the fault.
Containment
The system prevents the propagation of faults from their origin to a point in the system where the fault can
have any effect on the service to the user.
Masking
The system ensures that the correct output is passed to the user in spite of the failed component.
Compensation
It may be necessary for the system to provide a proper response to compensate for the output of the faulty
component.
Repair
The system removes the fault from the system or recovers the system. In well-designed fault tolerant
systems, faults are contained before they propagate to the extent that the delivery of system service is
affected. This leaves a portion of the system unusable because of residual faults. If subsequent faults occur,
the system may be unable to cope because of this loss of resources, unless these resources are reclaimed
through a recovery process which ensures that no faults remain in system resources or in the system state.
High Availability Planning

Determining an organization's availability requirements and architecting a system to meet them is a complicated
process. This topic identifies guidelines to consider when planning a high availability solution.
Guidelines When Planing an HA Implementation

When you are planning your HA system, you need to consider many different factors. Because every environment is
unique and every business has different needs, it is difficult to create an all-encompassing checklist for planning an
HA system. This list addresses the most important guidelines to consider:
Determine the cost of downtime.
It is difficult to estimate the cost to an organization if the system goes down. In general, the consequences
of a serious system failure will vary depending on the characteristics of the specific business. The
investment in a high availability solution should match the cost and risk of unavailability.
Wipro Confidential
Page 20 of 91
Understand the recovery point and recovery time.

It is important for you to determine when recovery is necessary in your system's operations and how long a time
exists between the point of failure and recovery. The recovery point is more significant in data-centric operations
where any loss of data is unacceptable. Recovery time is most important in transaction-centric environments. For
example, this simple diagram illustrates how the estimated recovery times and recovery points relate in CA,
electronic vaulting, off-site storage, and standard HA scenarios
Protect appropriate system components.

When you design your high availability solution, you should allocate more money to protect specific
system components. When determining the specific system components to protect, select the components
that would have the most impact in the event of a failure, are most likely to fail, or are the most expensive
to replace.
Focus on the areas that can have a significant, negative impact on the ability to keep an application
and your organization up and running if they fail.
You should consider which components are most likely to fail, because these will have the most
harmful effect on the MTBF values.
Protect components that may be expensive to replace in the event of failure.
Isolate and eliminate any single points of failure (SPOFs).
Wipro Confidential
Page 21 of 91
A SPOF is any system component that will cause downtime if it fails. It is important to investigate the path
of execution in your system and identify all the weak links in the chain. If one link breaks, the whole
system fails no matter how well constructed the rest of the system. You should walk through the whole
process from your servers and disk storage, to the applications, through the network, and to the client
systems. Common SPOFs are:
the computer system
Clustering software can be used to link several systems that can each run each other's applications
in time of failure of the primary system.
disks
Disk mirroring or disk array technology can be used to protect data.
host adapters and cables
Host adapter failures can be protected against with operating system features and redundant host
adapters.
networks
Networking has many hardware components; each could be a SPOF. The key to eliminating
failures within the network is understanding the topologies being used, understanding the failure
points within those topologies, and removing these failure points from the network. There are
many hardware and software products which provide increased network availability.
electrical power
Uninterruptible Power Supplies (UPSs) and/or multiple power sources can protect against
electrical power failures.
Ensure the security of the system.
Prevent data corruption and unauthorized access to your system. Security is an issue that is often
overlooked in discussions of HA management, because it does not immediately reduce the impact of
failure. However, it is important to any HA solution. The management center must be secured, so that only
authorized personnel have access to it. The management systems, or applications, also need to support
some type of user authentication, such as userIDs and passwords. Secure transactions between the
applications and the system components are available through Remote Procedure Calls (RPC), or some
other protocol. Secure communications should be implemented whenever possible in an HA configuration.
Centralize similar applications and services on large servers.
It should be noted that this is not a steadfast rule, sometimes many small machines running single instances
of databases or single applications can be a more appropriate configuration. In general, by consolidating
similar applications and services on centralized large servers, you can significantly reduce the complexity
of your system, the number of backups that are required, and the number of components that can fail.
Automate repetitive tasks.
You can significantly reduce the number of hours required for hands-on operations by automating the tasks
that are standard and repetitive. In addition, automation reduces the number of possible faults due to human
error, such as mis-typed commands or accidental file deletion. You can also update and maintain consistent
policies and procedures in a single centralized location.
Perform a thorough test initially and perform additional tests on a regular basis once the system is up
and running.
Before you deploy your HA solution, you should perform a thorough test that investigates every level in
your system, from hardware component faults to network failures. The testing environment should mimic
the eventual system environment as closely as possible: the same hardware, software, services, networks,
configurations, loads on the system, and users.
It is also important to perform tests on a regular basis once the system is up and running. Systems and
environments are constantly changing. The only way to ensure that the system can react to failures
appropriately at any given point in time is to test the system throughout it's life cycle.
Wipro Confidential
Page 22 of 91
Account for future growth.

It is important that any HA solution account for scalability. All data systems will expand with time. It is
much less expensive and easier to manage this growth if it is planned for early in development. For
example, it is much easier and cost-effective to add disks to large servers with many empty slots than it
would be to purchase additional servers.
Document policies and procedures.
While you are initially planning your HA system and while your system is being implemented, it is
important to document every policy or procedure that you develop. This documentation can serve as an
official archive of system information, a source for any troubleshooting actions that may be required in the
future, and it can ensure that other individuals can access vital system information in the event that you are
unavailable.
This documentation can be in a variety of formats:

HTML
This is probably the most common choice. This format is extremely portable and can be read in
any browser. The major consideration with this format is that you ensure that you make relative
references to the servers rather than hard links to particular URLs. HTML may prove to take up a
little more room that other formats.
PDF
Adobe PDF documents are very compressed and platform-independent. The only major drawback
is the limited ability to edit the documents in their native format.
Word processor
This format is the easiest to manage, however access to an appropriate reader may be an issue.
This is not as portable as other formats.
Paper documentation (hard copies)
Soft copies of your documentation are much easier to update than hard copies. However, you may
want to print a limited number of copies of your documentation once in a while to refer to quickly,
in case of a complete or extended system outage.
Select the appropriate software and hardware.

You should select the appropriate software to maximize data availability for your organization. There are
many considerations when selecting this software. Your data management software should feature
capabilities for clustering, load-balancing, application-level recovery, intelligent system and application
monitoring, centralized management, and you should select software that will be easy to troubleshoot
through mature customer/technical support and consulting organizations. You should always arrange for onsite consulting to help you implement your HA solution. It is also a good idea to take advantage of other
resources such as the product documentation, user groups and news groups, the software company's Website, and classroom training on the products.
There is a direct correlation between the reliability of your hardware and the your overall system reliability.
It is important that you obtain appropriate reliability data from hardware vendors, such as mean time
between failure figures that are proven and realistic. There are several other hardware considerations in
addition to reliability, such as ease of repair, ease of access, cost, compatibility, and storage capacity. It is
also a good idea to purchase spare hardware for components that may be more prone to fail that others.
Wipro Confidential
Page 23 of 91
Do not overcomplicate the system.

This is a very important guideline to consider when designing an HA solution, and is half of the availability
equation. There are many points in any system at which failures can occur. You should always try to keep
the design simple. For example, you should eliminate any extraneous system components, maintain servers
that are running only a single application or service, and choose a naming convention throughout your
system that is easy to remember and organize.
Reduce planned downtime.
Downtime is best defined as the period of time in which a user is unable to perform tasks in an efficient and
timely manner due to poor system performance or system failure. In data centers worldwide, a lot of
attention and investment has been made to ensure redundancy and high availability of hardware system
components, the vessels which process and hold corporate data.
In a study published by the IEEE (International Electric and Electronic Engineering Association), hardware
failures are the cause of only 10% of total system downtime. As much as 30% of all downtime is prescheduled, and most of this time is required due to the inability of system tools to permit online
administration of systems. Another 40% of downtime is due to software errors. Some of these errors are as
simple as a database running out of space on disk and stopping its operations as a result. Any
comprehensive HA solution has to be able to deliver application and information availability in the event of
any cause of downtime.
Examples of planned downtime include those times when the system is shutdown to add additional
hardware, upgrade the operating system, rearrange or repartition disk space, or clean up logfiles and
memory. If you implement an effective HA strategy, you can significantly reduce the amount of planned
downtime. For example, you can provide for backups, maintenance, and upgrades while the system is up
and running. You can also reduce the time required to perform the tasks that can only be done while the
system is down.
Wipro Confidential
Page 24 of 91
Balance the cost of the availability solution with the rewards.

The cost of purchasing, implementing, and managing the HA solution should be consistent with the
operational loss you wish to prevent. Achieve an appropriate trade off between the cost and the rewards of
an HA system. The relationship between cost and return on investment in HA systems can be viewed as a
curve. This curve illustrates the law of diminishing returns. As you move from a less expensive, simple
solution to more advanced solutions the costs increase dramatically
The Layered Approach to High Availability

This topic describes the layered availability approach and introduces the concepts and terminology involved in the
availability issues posed by each layer. The range of layers includes the application layer, and storage management
layer that enables you to manage logical, or virtual, storage volumes. The storage network infrastructure layer
features such components as hubs, switches, and Fibre Channel connectivity. Finally the disk and data storage layer
contains the tape libraries, intelligent disk arrays, and other storage devices. The concepts and terminology
introduced in this topic, are covered in greater detail in other topics throughout the course.
Wipro Confidential
Page 25 of 91
To simplify the management of a complicated system, you can break the system down to four basic layers:
Application layer
Storage management layer
Storage network infrastructure layer
Data storage layer
In order to reduce the time of recovery, you need to determine the level of service that each layer must deliver to the
others. You can also simplify management by logically organizing the resources in each layer.
Wipro Confidential
Page 26 of 91
Application Layer
The application layer is the direct interface between the system and the client machines, such as a database, an Email, or a custom application. HA solutions feature functionality that provides continuous service or access to
applications in the event of a fault or failure in a transparent manner. Throughout this course, it is important to view
your system from an application-based viewpoint. In other words, no matter what components, structure, policies,
and procedures are implemented in your HA solution, the most important consideration at any time is to minimize
the impact of a fault or failure on the users ability to access data through the application or service. HA issues
involved in this layer include clustering, application-level failovers, simplified management of large server farms,
common availability management, and replication of data to multiple sites.
Storage Management Layer
The storage management layer refers to the method by which the server manages the storage devices or disks. This
management is performed by the building blocks of an HA solution: volume management and a journaling
filesystem.
Volume Management
Often, the first step taken towards increasing a system's availability is to enable software-based redundancy of disks,
or software RAID. Software RAID defines a logical volume. A volume is a logical object on which filesystems are
written or to which databases write their data. Software RAID is often packaged with volume management software.
Journaling Filesystem
A file system is a collection of directories organized into a structure that enables you to locate and store files. All
information processed is eventually stored in a file system. When a system or server fails, the filesystem can be
eliminated. To avoid this problem, a tape backup is required to restore the filesystem. A journaling filesystem
journals the changes to the file system structure (and occasionally data). If the system crashes and is rebooted, the
journal is replayed to ensure the correctness of the file system structure. Data recovery is dependent upon the
specific application. For example, recovery of an Oracle database would require the use of Oracle log files.
Wipro Confidential
Page 27 of 91
Storage Network Infrastructure Layer
This layer refers to storage network connectivity. This layer is becoming more and more of a concern to the modern
enterprise. Originally, most environments simply connected a server to a storage device through a SCSI connection.
Now, organizations are using other more advanced network connection technology such as Fibre Channel
technology and storage area networking (SAN). Rather than viewing this layer simply as a server connecting to a
piece of storage, you should consider multiple paths between servers and storage. You need to investigate the
possibility of implementing some sort of network redundancy to ensure that if you lose an access route between the
system and storage, there is another access path available.
Data Storage Layer
In addition to application availability, managing storage effectively, and ensuring that you maintain network
connectivity, there are data availability concerns in the storage pool itself. In this layer, you can enable online,
dynamic reconfiguration of storage pool. You need to account for growth and scalability. No matter how many disk
arrays you have, you will inevitably require more in the future. You should also consider the capacity management
aspects of your storage devices and determine how to optimize storage space across common disk hardware.
General RAID Levels

RAID is an array of disks in which redundant data is stored in different places on multiple disks. The redundant
information enables regeneration of user data in the event that one of the disks in the array or the access data path to
it fails. By placing data on multiple disks, I/O operations can overlap in a balanced way, improving performance.
RAID also increases the MTBF and fault tolerance. RAID employs the disk striping, or partitioning of each drive's
storage space into units. The stripes of all the disks are interleaved and addressed in order. In this topic, you learn
about the various RAID levels.
General RAID Levels

There are five basic RAID levels that are commonly recognized. In addition, there are several other RAID levels
that are less common variations on these five basic levels. There are also several common RAID combinations that
can also be configured. The most appropriate RAID configuration for a specific filesystem or database tablespace
Wipro Confidential
Page 28 of 91
must be determined based on data access patterns and determining an appropriate trade-off between cost and
performance.
RAID-0 (striping)
This RAID level features disk striping, but no redundancy of data. In this configuration,
a collection of data is divided into small chunks that are written to a separate disk in the
array. This RAID level supplies performance acceleration at no increased storage cost,
because individual disks can perform concurrent write operations. RAID-0 offers no
increase in data availability. In fact, if implemented by itself, RAID-0 decreases overall
data availability. This is because for one disk to function, all the other disks in the array
must be functioning as well. Any failure of an individual disk in the stripe will result in
the inability to perform any read or write operations in the entire stripe. RAID-0 would be
an option for applications requiring high bandwidth such as video production and editing,
image editing, or pre-press applications.
RAID-1 (mirroring)
RAID-1 requires at least double the disk capacity of RAID-0. In RAID-1, the data is
replicated on a separate disk, or multiple disks. No disk striping occurs. Every byte on
one disk is copied block-for-block on a separate disk that acts as a peer and is completely
in sync with the original disk. In the event of an individual disk failure, the other disk
maintains operation without any service interruption. RAID-1 provides the highest
performance for redundant storage, because it does not require read-modify-write cycles
to update data, and because multiple copies of data can be used to accelerate readWipro Confidential
Page 29 of 91
intensive applications. However, resyncing or creating a new RAID-1 copy requires time
and a significant amount of I/O. Therefore, a disadvantage to RAID-1 is the fact that
write performance may suffer. RAID-1 requires 100% additional disk capacity for each
mirror copy. Therefore, another major disadvantage is cost. This RAID level would be
recommended for applications requiring increased availability such as accounting,
payroll, or other financial applications.
RAID-2 (Hamming encoding)
RAID-2 features disk striping. This RAID level detects errors that occur and determines
which part is in error by using error checking and correcting (ECC) information. RAID-2
detects 2-bit errors and corrects 1-bit errors on the fly. Each data disk has its Hamming
Code ECC information recorded on ECC disks. On read operations, the ECC code
verifies data or corrects single disk errors. You need a high ratio of ECC disks to data
disks with smaller word sizes. It has no clear advantages over RAID-3, and is not used in
practice.
RAID-3 (byte striped across a group of disks)
RAID-3 uses disk striping in a parallel fashion with each virtual disk block distributed
across all the disks in the array except for one that stores the parity check. The parity disk
Wipro Confidential
Page 30 of 91
permits the regeneration and rebuilding of data in the event of a disk failure. In RAID-3,
the stripe depth of an N+1 array is equal to 1/N virtual blocks and each disk drive must be
on its own separate I/O channel. For example, if the virtual block size for a 4+1 set, is
512 bytes, then the stripe depth is 128 bytes (512/4). The RAID volume can only process
one disk I/O at a time. All I/O operations access all disks, because the bytes are
distributed across multiple disks (parallel transfer). For this reason, RAID-3 is best for
applications that are single stream bandwidth-oriented. This would not be a good choice
for a database server, because databases tend to read and write smaller blocks. RAID-3 is
likely to perform significantly better in a controller-based implementation.
RAID-4 (dedicated parity disk)
RAID-4 uses large stripes, and dedicates one drive to storing parity information. RAID-4
is very similar to RAID-3. The major difference is that where in a RAID-3 array, the
stripe and logical block size are equal, RAID-4 arrays implement variable stripe sizes. In
RAID-4, the stripe depth is an integer multiple of the virtual block size. This means that
Wipro Confidential
Page 31 of 91
multiple virtual blocks can be placed within a single stripe in the RAID-4 array.You can
read records from any single drive. This enables you to take advantage of overlapped I/O
for read operations. Since all write operations have to update the parity drive, no I/O
overlapping is possible. RAID-4 offers no advantage over RAID-5. As with RAID-3, a
RAID-4 implementation is ideal for systems performing large file transfers. It does not
perform well when used in applications that require small file writes at high I/O rates.
RAID-5 (block striped across a group of disks)
RAID-5 removes a possible bottleneck on the parity drive by rotating parity across all
drives in the set. RAID-5 requires at least three and usually five disks for the array. All
read and write operations can be overlapped. RAID-5 stores parity information but not
redundant data. Recovery from a RAID-5 disk failure requires a complete read of all the
disks in the stripe. The recovery process can be time-consuming and system performance
will suffer during recovery. This is the most complex and versatile of the basic RAID
architectures. RAID-5 is best suited for file and application servers, database servers in a
datawarehousing environment, Web servers, and e-mail servers.
The performance overhead for writes can be substantial in a RAID-5 configuration,
because a write can involve much more than simply writing to a data block. A write can
involve reading the old data and parity, computing the new parity, and writing the new
data and parity.
RAID Level Variations
RAID-6
RAID-6 is similar to RAID-5, but with additional independently computed check data. It includes a second
parity scheme that is distributed across different drives and offers very high fault-tolerance. Currently, there
are very few commercial examples of RAID-6.
RAID-7
RAID-7 includes a real-time embedded operating system as a controller, caching data through a high-speed
bus, and other characteristics of a stand-alone computer. This RAID level is not common.
Wipro Confidential
Page 32 of 91
RAID Combinations
RAID-01 (mirrored stripes)
RAID-01 is a mirrored RAID-1 pair made from two RAID-0 stripe sets. It is configured
by creating two RAID-0 sets and adding RAID-1. If you lose a drive on one side of a
RAID-01 array, then lose another drive on the other side of that array before the first side
is recovered, you will suffer complete data loss. It is also important to note that in the
event of a single disk failure, all drives in the surviving mirror are involved in rebuilding
the entire damaged stripe set. Performance during recovery is severely degraded during
recovery unless the RAID subsystem allows adjusting the priority of recovery. However,
shifting the priority toward production will lengthen recovery time and increase the risk
of the kind of the catastrophic data loss mentioned earlier.
Example of RAID01 Failure
In this example, if Disks A and D fail, all the disks are unavailable.
RAID-10 (striped mirrors)
Wipro Confidential
Page 33 of 91
RAID-10 is a stripe set made up from a number of mirrored pairs. Only the loss of both drives in the same
mirrored pair can result in any data loss and the loss of that particular drive is 1/Nth as likely as the loss of
some drive on the opposite mirror in RAID-01. Recovery only involves the replacement drive and its
mirror so the rest of the array performs at 100% capacity during recovery. Since only the single drive needs
recovery bandwidth, requirements during recovery are lower and recovery takes far less time, reducing the
risk of catastrophic data loss. The performance of RAID-10 and RAID-01 are identical, but they have
different levels of data integrity.
Example of RAID10 Failure
In this example, first Disk A fails and all the other disks are available. If disk D fails, only the data on disks A and D
are offline.
RAID-53
RAID-53 offers an array of stripes in which each stripe is a RAID-3 array of disks. This offers higher performance
than RAID-3 but at much higher cost and requires at least 5 drives
Wipro Confidential
Page 34 of 91
Software RAID vs. Hardware RAID

The basic characteristics and configurations of RAID levels are the same in both software and hardware RAID. The
main difference is the point at which the disk management operations occur. This topic identifies the advantages and
disadvantages of software and hardware RAID
Hardware RAID (Controller-Based)
Wipro Confidential
Page 35 of 91
In hardware RAID, the management operations required to implement the RAID disk array occur within the disk
array itself. The host system does not perform the operations, but an interface program runs on the host system that
enables you to monitor the disk management operations. A hardware RAID operation creates a logical unit (LUN)
that can be monitored regardless of the operating system of the host system. It is often a safe assumption that the
disks are managed properly, no matter what the RAID level, within a hardware RAID configuration. A hardware
RAID system is basically a specialized, single-purpose system that features a controller that does nothing but
aggregate storage disks, stripe and mirror data across these disks, and calculate parity.
The advantages of using hardware RAID over software RAID:
Increased performance on the host system
Performance is increased because the disk management operations are off-loaded onto the disk array. For
example, in a mirrored controller-based configuration, the host would need to pass only one write request
through the disk driver and across the I/O bus, where the controller would decompose it into two separate
writes.
Enhanced features
Hardware RAID manufacturers often add enhanced functionality to their hardware. Such enhancements
include additional internal memory in the disk array, and the abilities to replicate data over a WAN, share
specific disks between multiple host systems, and lock out other hosts while a single host is accessing a
Wipro Confidential
Page 36 of 91
disk. Enterprise class hardware RAID systems also often include redundant power supplies and cooling
fans.
Efficiency
Hardware RAID systems tend to be very efficient because they feature hardware that is only concerned
with performing RAID operations. The RAID controller does not have to concern itself with graphic user
interfaces (GUIs) and other aspects of a general purpose operating system.
The disadvantages of using hardware RAID over software RAID:
Dependence on one RAID hardware vendor
Every RAID manufacturer uses a different management interface and once you familiarize yourself to one,
it will be difficult to switch to a different vendor.
Inability to combine disks from different arrays into a single array
This will create another SPOF in the system.
Hard to resize LUNs.
In most cases, once a LUN is full, you cannot simply increase the size of the LUN to accommodate new
data. You have to destroy the original LUN, create another larger LUN, and then restore the original data to
the new LUN.
Hardware limits on the number and size of LUNs
Often, RAID vendors will enforce some hardware limits that might limit your ability to configure your
system for optimal performance.
Cost
Hardware RAID is more expensive than software RAID.
No inter-box protection
A specific RAID controller has no visibility to other RAID boxes or storage devices.
Software RAID (Host-Based)

Rather than utilizing a dedicated hardware controller to perform the various management operations required to
implement a RAID array, in software RAID the operations are performed by the host system processor using special
software. Disk array management is a somewhat low level activity that is performed underneath the other
applications that run on the host system. Therefore, software RAID is usually implemented at the operating system
level. Software RAID is supported on Windows NT and 2000 platforms, as well as a majority of the various UNIX
platforms. The output of software RAID is a logical volume. A volume is a logical object on which file systems are
written or to which databases write their data.
Advantages of using software RAID over hardware RAID:
Cost
If you are already running an operating system that supports software RAID, you have no additional costs
for controller hardware. However, you may be required to add more system memory.
Simplicity
You are not required to install, configure, or manage a hardware RAID controller.
Flexibility in hardware
By moving the management operations off the hardware, you are allowed more flexibility in selecting
appropriate hardware. In fact, you can use a wide range of online storage, such as just a bunch of disks
(JBOD), enterprise RAID, and a smaller RAID system.
Flexibility in disk configuration
Software RAID implementations can build RAID objects from partitions of disks, rather than being
restricted to whole disks, they can use a disk pool to meet a diverse set of performance and availability
requirements. For instance, one might create a small high-performance striped file system by using only a
few cylinders on a very large number of drives, and use the remaining space on those same drives for
concatenated, mirrored or RAID-5 volumes with different I/O characteristics.
Increased redundancy
Wipro Confidential
Page 37 of 91
A duplexed RAID-1 array can sometimes be implemented in software RAID, but not in hardware RAID,
depending on the controller. Building redundant layouts using disks with separate connections to the host
can enhance availability, eliminating the single points of failure introduced by non-redundant host
connections.
The disadvantages of using software RAID over hardware RAID:

Performance
The most significant drawback of software RAID is that it provides lower overall system performance than
hardware RAID. Cycles are taken from the CPU of the host system to manage the RAID array. In reality,
the impact of these operations is not that excessive for simple RAID levels like RAID-1. However, the
impact on performance can be substantial, particularly with any RAID levels that involve striping with
parity, such as RAID-5.
Boot volume limitations
The operating system cannot boot from the RAID array, due to the fact that the operating system has to be
running to enable the RAID array. A separate partition needs to be created for the operating system. This
segments the system capacity, lowers the performance, and increases the time required to boot the system.
RAID level limitations
Software RAID is usually limited to RAID-0, RAID-1, RAID-5, RAID-01, and RAID-10.
Advanced feature support
Software RAID normally does not include support for the advanced features that may be available to
hardware RAID arrays.
Operating system (OS) compatibility issues
Generally, if you enable software RAID by using a particular operating system, only that particular
operating system can access that array. This creates problems with multiple-OS environments.
Software compatibility issues
Some software utilities, such as partitioning and formatting utilities, may have conflicts with software
RAID arrays.
Reliability
Implementing software RAID increases the chance of potential bugs that might compromise the integrity
and reliability of the array.
Combining Software and Hardware RAID

You should not consider there to be a distinct choice between software and hardware RAID solutions. Host-based
volume management features all the advantages of software RAID to complement hardware RAID systems. By
combining hardware and software RAID, you can realize the best features of both solutions. The off-loaded
processing, reduced I/O transfer requirements, and redundant components of most.hardware RAID subsystems,
Wipro Confidential
Page 38 of 91
coupled with the configuration flexibility added by the inclusion of software-based RAID. Combining hardware and
software RAID solutions offer several key benefits:
Increased availability
Many hardware RAID solutions retain single points of failure (SPOFs), allowing data to become
unavailable if a non-disk component of the array fails. When software RAID is used to build configurations
that incorporate hardware RAID units in separate arrays, many of these vulnerabilities can be eliminated.
Increased performance
A single hardware RAID controller may present a bottleneck to data access because of limited array bus
and host-to-array bandwidth, as well as CPU cycles needed for parity calculations. Efficient controllerbased algorithms can be combined with multiple host connections and supplementary software RAID
processing to increase bandwidth and throughput.
Improved manageability
The limited set of configuration options and the static configuration utilities for hardware RAID
subsystems may make initial setup seem simpler than setting up a software RAID configuration. However,
after running the system, the configuration may need to be modified to reflect the actual I/O pattern of the
applications. With a controller-based setup, this is usually achieved by backing up the data, reconfiguring
the array, and reloading the data. This requires interruption of data access. The on-line reconfiguration
capabilities of most software RAID solutions can be used to enhance the performance monitoring, tuning,
and reconfiguration of hardware RAID, simplifying administration while increasing uptime and
performance.
Wipro Confidential
Page 39 of 91
Wipro Confidential
Page 40 of 91
Defining a Volume
The basis for any volume management solution is a volume. This topic defines a volume and identifies the
advantages of using volumes to manage storage.
What Is a Volume?
Volumes enable an application to view a number of disks as a single logical unit, no matter the physical location of
the disks. This volume has the performance, reliability, and other attributes of its individual components. Each
volume records and retrieves data from one or more physical disks. Volumes are accessed by file systems, databases,
or other applications in the same way that physical disks are accessed. Volumes are also composed of other virtual
objects that are used to change the volume configuration. Volumes and their virtual components are called virtual
objects. Volumes can be used to perform administrative tasks on disks without interrupting applications and users.
Advantages of Volumes
There are several advantages to using volumes:
Ability to combine RAID levels
Volumes enable you to combine any number of different RAID levels. For example, if the important
consideration is cost, maybe you would implement a RAID-5 solution. Alternatively, if you require very
high performance, then you might use striped mirrors.
Scalability
Virtual volumes also offer the flexibility to grow the storage capacity without disrupting the system. Instead
of taking the server off-line or physically moving data from point A to point B, you can simply add more
storage to the volume.
Increased performance and failure tolerance
You can combine enterprise RAID and JBOD and your system will feature the advantages of both. You can
take advantage of a hardware controller and the flexibility of host-based volume management.
VERITAS Volume Management: Virtual Objects

There are several basic methods to manage online storage to increase data availability. Before you can understand
the specific principles involved in each of these methods, it is important to define some of the basic virtual objects
and their relationships to each other. This topic provides an overview of VERITAS Volume Manager (VxVM) and
describes the relationships between the various VxVM objects.
Overview of VxVM
VxVM provides easy-to-use online disk storage management for computing environments. Traditional disk storage
management often requires that systems be taken offline at a major inconvenience to users. VxVM provides the
Wipro Confidential
Page 41 of 91
tools to improve performance and ensure data availability and integrity. VxVM also enables you to dynamically
configure disk storage while the system is active.
The connection between physical objects and VxVM objects is made when you place a physical disk under VxVM
control. VxVM creates virtual objects and makes logical connections between the objects. The virtual objects are
then used by VxVM to perform storage management tasks. VxVM objects include:
VxVM disks
When you place a physical disk under VxVM control, a VxVM disk is assigned to the
physical disk. Each VxVM disk corresponds to at least one physical disk. A VxVM disk
typically includes a public region where user data is stored, and a private region where
VxVM internal configuration information is stored.
Disk groups
A disk group is a collection of VxVM disks. You group disks into disk groups for
management purposes, such as to hold the data for a specific application or set of
applications. For example, data for accounting applications can be organized in a disk
group called "acctdg".
Wipro Confidential
Page 42 of 91
A disk group configuration is a set of records with detailed information about related
VxVM objects, their attributes, and their connections. Disk groups are configured by the
system administrator and represent management and configuration boundaries. You can
create additional disk groups as necessary. Disk groups allow you to group disks into
logical collections. Disk groups enable high availability, because a disk group and its
components can be moved as a unit from one host system to another. Disk drives can be
shared by two or more hosts, but accessed by only one host at a time. If one host crashes,
the other host can take over the failed host's disk drives, as well as its disk groups.
Subdisks
A subdisk is a set of contiguous disk blocks. VxVM allocates disk space by dividing a
VxVM disk into one or more subdisks. Each subdisk represents a specific portion of a
VxVM disk, which is mapped to a specific region of a physical disk. A VxVM disk can
contain multiple subdisks, but subdisks cannot overlap or share the same portions of a
VxVM disk.
Plexes (mirrors)
Wipro Confidential
Page 43 of 91
VxVM uses subdisks to build virtual objects called plexes (or mirrors). A plex consists of
one or more subdisks located on one or more physical disks. To organize data on the
subdisks to form a plex, use the following methods:
Concatenation
Striping (RAID-0)
Mirroring (RAID-1)
Striping with parity (RAID-5)
Volumes
Wipro Confidential
Page 44 of 91
A volume consists of one or more plexes, each holding a copy of the data in the volume.
Due to its virtual nature, a volume is not restricted to a particular disk or a specific area of
a disk. The configuration of a volume can be changed by using the VxVM user interfaces.
Configuration changes can be done without causing disruption to applications or file
systems that are using the volume. For example, a volume can be mirrored on separate
disks or moved to use different disk storage. A volume can consist of up to 32 plexes,
each of which contains one or more subdisks. A volume must have at least one associated
plex that has a complete copy of the data in the volume with at least one associated
subdisk.
VxVM Object Relationships
VxVM virtual objects are combined to build volumes. The virtual objects contained in volumes are:
VxVM disks
Disk groups
Subdisks
Plexes
Volume Manager objects have the following connections:
VxVM disks are grouped into disk groups.
One or more subdisks (each representing a specific region of a disk) are combined to form plexes.
A volume is composed of one or more plexes.
In this example, a disk group has two VxVM disks. One disk has a volume with one plex and two subdisks. The
other disk has a volume with one plex and a single subdisk.
Wipro Confidential
Page 45 of 91
VERITAS Volume Management: Volume Layouts

A volume's layout refers to the organization of plexes in a volume. Volume layout is the way plexes are configured
to remap the volume address space through which I/O is redirected at run-time. Volume layouts are based on the
concept of disk spanning, which is the ability to logically combine physical disks in order to store data across
multiple disks. This topic identifies the volume layouts that are available in VERITAS Volume Manager and relates
these layouts to the appropriate, general RAID level.
Common Volume Layouts Available in VxVM

A variety of volume layouts are available, and each layout has different advantages and disadvantages. The layouts
that you choose depend on the levels of performance and reliability required by your system.
With VxVM, you can change the volume layout without disrupting applications or file systems that are using the
volume. A volume layout can be configured, reconfigured, resized, and tuned while the volume remains accessible.
Common volume layouts include:
Concatenated (No RAID)
Mirrored (RAID-1)
Striped (RAID-0)
RAID-5
Layered volumes (RAID-01 and RAID-10)
Concatenated Layout
A concatenated volume layout maps data in a linear manner onto one or more subdisks in a plex. Subdisks do not
have to be physically contiguous and can belong to more than one VxVM disk. Storage is allocated completely from
one subdisk before using the next subdisk in the span. Data is accessed in the remaining subdisks sequentially until
the end of the last subdisk. For example, if you have 14GB of data, then a concatenated volume can logically map
Wipro Confidential
Page 46 of 91
the volume address space across subdisks on different disks. The addresses 0GB to 8GB of volume address space
map to the first 8-gigabyte subdisk, and addresses 9GB to 14GB map to the second 6-gigabyte subdisk. An address
offset of 12GB therefore maps to an address offset of 4GB in the second subdisk.
Concatenation removes the restriction on size of storage devices imposed by physical disk size. It also enables
better utilization of free space on disks by providing for the ordering of available discrete disk space on multiple
disks into a single addressable volume. In addition, large file systems can be created to reduce overall system
administration complexity. However, concatenation does not protect against disk failure. A single disk failure may
result in the failure of the entire volume.
Striped Layout
A striped volume layout maps data so that the data is interleaved, or allocated in stripes, among two or more
subdisks on two or more physical disks. Data is allocated alternately and evenly to the subdisks of a striped plex.
Wipro Confidential
Page 47 of 91
The subdisks are grouped into "columns". Each column contains one or more subdisks and can be derived from one
or more physical disks. To obtain the performance benefits of striping, each column within a striped volume should
not be allocated space from any disk used by any other column within that volume.
All columns must be the same size. The size of a column should equal the size of the volume divided by the number
of columns.
Data is allocated in equal-sized units, called stripe units, that are interleaved between the columns. Each stripe unit is
a set of contiguous blocks on a disk. The stripe unit size can be in units of sectors, kilobytes, megabytes, or
gigabytes. The default stripe unit size is 128 sectors (64K), which provides adequate performance for most general
purpose volumes. Performance of an individual volume may be improved by matching the stripe unit size to the I/O
characteristics of the application using the volume.
Mirrored Layout
By adding a mirror to a concatenated or striped volume, you create a mirrored layout. A mirrored volume layout
consists of more than one plex that is a duplicate of the information contained in a volume. Each plex in a mirrored
layout contains an identical copy of the volume data. In the event of a physical disk failure and the plex on the failed
disk becomes unavailable, the system can continue to operate using the unaffected mirrors.
Wipro Confidential
Page 48 of 91
Although a volume can have a single plex, at least two plexes are required to provide redundancy of data. Each of
these plexes must contain disk space from different disks to achieve redundancy.
Volume Manager uses true mirrors, which means that all copies of the data are the same at all times. When a write
occurs to a volume, all plexes must receive the write before the write is considered complete.
Each plex in a mirrored configuration can have a different layout. For example one plex can be concatenated and the
other plex can be striped. You should distribute mirrors across all types of hardware to prevent the loss of more than
one copy of the data in case of a single point of failure.
RAID-5 Layout
A RAID-5 volume layout has the same attributes as a striped plex, but includes one additional column of data that is
used for parity. Parity provides redundancy.
Wipro Confidential
Page 49 of 91
Parity is a calculated value used to reconstruct data after a failure. While data is being written to a RAID-5 volume,
parity is calculated by doing an exclusive OR (XOR) procedure on the data. The resulting parity is then written to
the volume. If a portion of a RAID-5 volume fails, the data that was on that portion of the failed volume can be
recreated from the remaining data and parity information.
RAID-5 volumes keep a copy of the data and calculated parity in a plex that is striped across multiple disks. Parity is
spread equally across disks. Given a 5- column RAID-5 where each column is 1G in size, the RAID-5 volume size
is 4G.
One column of space is devoted to parity, and the remaining four 1G columns are used for data.
The default stripe unit size for a RAID-5 volume is 32 sectors (16K). Each column must be the same length but may
be made from multiple subdisks of variable length. Subdisks used in different columns must not be located on the
same physical disk.
RAID-5 requires a minimum of three disks for data and parity. When implemented as recommended, an additional
disk is required for the RAID-5 log.
RAID-5 cannot be mirrored.
Layered Volume Layout

A layered volume is a virtual Volume Manager object that nests volumes within volumes to create more complex
volume structures that mirror data at a more granular level.
With this new method of mirroring, data is mirrored at the column or subdisk level. Loss of a disk results in the loss
of a copy of a column or subdisk within a plex. Further disk losses may occur without affecting the complete
volume. Only the data contents of the column or subdisk affected by the loss of the disk needs to be recovered.
Wipro Confidential
Page 50 of 91
Stripe-Mirror (RAID-10)
This example illustrates a layered volume layout called a stripe-mirror layout. In this layout, VxVM creates
underlying volumes that mirror each subdisk. Each of these underlying volumes are used as subvolumes to create a
top-level volume that contains a striped plex of the data.
If two drives fail, the volume survives 4 out of 6 (2/3) times. In other words, the use of layered volumes reduces the
risk of failure rate by 50%.
If a disk fails in a stripe-mirror layout, only the failing subdisk must be detached, and only that portion of the
volume loses redundancy. When the disk is replaced, only a portion of the volume needs to be recovered, which
takes less time.
Mirror-Stripe (RAID-01)
This layout mirrors data across striped plexes. The striped plexes can be made up of different numbers of subdisks.
In the example, plexes are mirrors of each other; each plex is striped across the same number of subdisks. Each
striped plex can have different numbers of columns and different stripe unit sizes. One plex could also be
concatenated.
Wipro Confidential
Page 51 of 91
When you create a volume that is less than one gigabyte in size, a nonlayered mirrored volume is created by default.
Nonlayered, mirrored layouts are recommended if you are using less than 1GB of space, or using a single drive for
each copy of the data.
How Do Layered Volumes Work?
In a regular mirrored volume, subdisks originate from the disk media. In a layered volume, the subdisks originate
from underlying volumes. These subdisks are also called subvolumes. Subvolumes and subdisks are equivalent
objects in terms of constructing a volume. In a layered volume, only the top-level volume is accessible as a device
for use by applications.
Layered volumes tolerate disk failure better than non-layered volumes and provide improved data redundancy. If a
disk in a layered volume fails, a smaller portion of the redundancy is lost and recovery and resynchronization time is
usually quicker than it would be for a nonlayered volume that spans multiple drives.
Stripe-Mirror
Mirror-Stripe
Attribute
Volume
Volume
Recovery of a single
The entire plex (full
subdisk failure
Only the lower plex,
volume contents) that
requires
not the top-level plex.
contain the subdisk.
resynchronization of:
For example, at 10
75 seconds (both
MB per second, the
subvolumes can be
time it will take to
150 seconds.
synchronized at the
resynchronize the
same time).
mirror is:
Layered volumes consist of more VxVM objects than nonlayered volumes. Therefore, layered volumes may fill up
the disk group configuration database sooner than nonlayered volumes. When the configuration database is full, you
cannot create more volumes in the disk group.
Volume Management: Hot Relocation

Your system can be protected from the impact of disk failure through a process called hot relocation. Hot relocation
is a feature of VxVM, and automatically detects disk failures and restores redundancy to failed VxVM objects by
moving subdisks from failed disks to other disks. When hot relocation is enabled, the system administrator is
notified by email about disk failures. This topic describes the hot relocation process.
Disk Failures
Disk failures can be classified into two general categories:
Permanent disk failure
When a disk is corrupted and no longer usable, the disk must be logically and physically removed, and
then replaced with a new disk. With permanent disk failure, data on the disk is lost.
Temporary disk failure

Wipro Confidential
Page 52 of 91
When communication to a disk is interrupted, but the disk is not damaged, the disk can be logically
removed, then reattached as the replacement disk. With temporary (or intermittent) disk failure, data still
exists on the disk.
What Is Hot Relocation?
Hot relocation is a feature of VxVM that enables a system to automatically react to I/O failures on redundant
VxVM objects and restore redundancy and access to those objects. VxVM detects I/O failures on objects and
relocates the affected subdisks. The subdisks are relocated to disks designated as spare disks or to free space within
the disk group. VxVM then reconstructs the objects that existed before the failure and makes them redundant and
accessible again.
Partial Disk Failure
A partial disk failure is a failure that affects only some subdisks on a disk. When a partial disk failure occurs,
redundant data on the failed portion of the disk is relocated. Existing volumes on the unaffected portions of the disk
remain accessible. With partial disk failure, the disk is not removed from VxVM control. Before removing a failing
disk for replacement, you must evacuate any remaining volumes on the disk.
How Does Hot Relocation Work?

The hot relocation feature is enabled by default. No system administrator action is needed to start hot relocation
when a failure occurs.
The vxrelocd service, or daemon, starts during system startup and monitors VxVM for failures involving disks,
plexes, or RAID-5 subdisks. When a failure occurs, vxrelocd triggers a hot relocation attempt and notifies the
system administrator, through email, of failures and any relocation and recovery actions.
Wipro Confidential
Page 53 of 91
A successful hot relocation process involves:

1. Failure detection
Detecting the failure of a disk, plex, or RAID-5 subdisk.
2. Notification
Notifying the system administrator and other designated users and identifying the affected Volume Manager
objects.
3. Relocation
Determining which subdisks can be relocated, finding space for those subdisks in the disk group, and
relocating the subdisks. The system administrator is notified of the success or failure of these actions. Hot
relocation does not guarantee the same layout of data or the same performance after relocation.
4. Recovery
Initiating recovery procedures, if necessary, to restore the volumes and data. Again, the system
administrator is notified of the recovery attempt.
Wipro Confidential
Page 54 of 91
Fault Resilient Clustering Concepts
A fault resilient cluster features at least one machine that is configured to assume responsibility for a failed server.
When one machine in the pair fails, its services are moved to the second server. This is called failover. Failover is
defined as the migration of services from one server to another. In a fault resilient cluster, a significant outage of
your primary server will have little impact on your users. Software can be added to hardware clustering solutions to
provide 99.99% data availability. This accounts for only 53 minutes of downtime per year. In most instances, only
seconds or minutes are lost. This topic describes the general characteristics of fault resilient HA clusters.
Fault Resilient Failover

It is important to view a cluster as a collection of servers that are configured to failover to one another in the event of
a fault or failure from an application and service viewpoint. The servers are seen as the service or application
providers, or as a delivery system. It does not matter which server is providing the service to the user, as long as the
service is running. In the event of a failure, there may be a temporary loss of access by the user, but the system
should be able to recover quickly.
With few exceptions, nearly every environment can benefit from fault resilient clusters. If system failures cost your
company any money at all in either hard costs or customer perception, a fault resilient HA solution is most likely the
appropriate solution.
Characteristics of a Fault Resilient Pair or Cluster
A fault resilient HA system features:
Redundant components in separate servers
You require at least two servers that are similarly configured.
Multiple independent copies of the operating system
Each server in the pair or cluster should be running the same version of operating system. They should each
feature unshared, independent disks that contain not only the operating system, but also any software
required for the failover process.
No single point of failure (SPOF)
A SPOF in any server in a fault resilient pair or cluster can cause the failure of the whole pair or cluster and
should be eliminated.
Commercially available failover management software (FMS)
Wipro Confidential
Page 55 of 91
A well-tested and robust FMS, such as VERITAS Cluster Server, supports all common networks, databases,
and applications, and features many advantages over other options for monitoring and managing failovers
in a fault resilient pair or cluster.
Support for planned maintenance
Fault resilient solutions support planned maintenance of OS software, applications, or hardware. When a
system is brought offline for maintenance, other systems can immediately take over services to ensure that
the failover is completely transparent to users.
Minimal effects of failover on users
The effect of a fault or failure should be almost completely transparent to your users. The most intrusive
effect that a failover can have on a user in a fault resilient system is a simple reboot of the client machine.
In most cases, even this much of an intrusion is not acceptable. After a failover, the user should not have to
perform any actions to return to work, once the services have been restored by another server in the cluster.
Very quick failover times
Ideally, the failover time in a fault resilient system will be less than 2 minutes. You should always have the
backup server running and have as many system processes active as possible to enable minimal failover
time. The takeover server should never require a reboot in the event of a failover. If this happens, the
failover time can increase to almost an hour in some cases. It is a good idea to create a failover time
expectation for your users.
Minimal hands-on interaction
Ideally, the failover process should never require any sort of human intervention.
Data integrity
To guarantee data integrity, it is required that the servers in a fault resilient cluster share the same storage
disks. After a failover, the user must see the same consistent data that was available to the original server.
These shared disks are critical and should feature some sort of mirrored RAID protection.
Communication networks
Each server in a cluster must continuously monitor the state of the other servers in the cluster. This
is accomplished through a pair of heartbeat networks that run independent of one another.
Another network is required to communicate with the clients or users. This is called the public, or
service, network.
It is not a necessary requirement, but the servers in a fault resilient pair or cluster should also
maintain communication with system administrators. This can be accomplished by a separate
administrative network.
Wipro Confidential
Page 56 of 91
Fault Resilient System Components

Servers
To simplify configuration and administration, all the servers in a fault resilient pair or cluster are completely
identical. This means that they have the same processor type, identical memory, and they are running the same
version of operating system with identical patches. Many system vendors manufacture models that have subtle
differences. It is important that you avoid any incompatibility issues by using identical servers. If you do utilize
different system models, you should use combinations that are proven to be compatible and are well-tested in cluster
environments.
Networks
A fault resilient pair or cluster has three separate levels of network communication:
Wipro Confidential
Page 57 of 91
1.
Public network
The public network is the means by which the server pair or cluster communicates with the end users. In
many systems, the network is the least available component. You can determine methods to increase
availability by breaking the public network down into three basic components:
User access devices

These devices include client terminals, PCs, and workstations. For user access devices, there are no special
redundant components that you can use to improve availability. When a user access device fails, the failure
affects a single user. Often a user can still access the system by using another device or accessing a shared
pool of devices.
Local LAN segments

Local LAN devices are generally connected to a backbone network with routers or bridges. It is preferable
to configure parallel access points to the network, especially if you have a large number of users who must
have access to applications at all times. You can implement redundant networks that enable you to switch
the flow of data from one network to another in the event of a loss of network connectivity.
LAN interface components

Wipro Confidential
Page 58 of 91
The servers link to the network through a network interface card (NIC). You should implement
some sort of redundancy at the NIC level to ensure that the servers can connect to the network
even if there is a fault or failure in the NIC. At each cluster node, you should allow for at least two
parallel, independent networking access points. If message traffic is heavy, you may need
additional access points to support message traffic during system failover .
2.
Heartbeat networks
Heartbeat networks are the channel through which the servers in a pair or cluster communicate and monitor each
other. When the heartbeat stops, connectivity is lost.
3.
Administrative network
An administrative network is not a required component of a fault resilient cluster.

However, it is a good idea to have a redundant network that is able to be accessed solely
by the administrator. This network enables the administrator to monitor the status of the
servers or system resources, even if other public or private networks in your system fail.
Wipro Confidential
Page 59 of 91
Disks
Private disks
The private disks are unshared, independent disks that contain not only the operating system, but also any
software required for the failover process.
Public disks
Public disks are the shared storage disks that are accessed by the end user. After a failover, the user should
see the same consistent data that was available to the original server. Public disks are critical and should
feature some sort of mirrored RAID protection.
Stages of Failover
There are three basic stages of failover:
Discovery
First, a hardware or software fault triggers the failover process. This fault can be a part of one system, an
entire system, or a group of systems. Next, the system recognizes that there has been a downgrade in status.
Some subsystems, such as RAIDs, may have built-in automatic recovery capabilities. If not, then the
failover process begins.
Notification
In this stage, the system is made aware of the failure. In fault tolerant systems, subassemblies may be
configured to notify their parent assemblies that they have failed. A driver must be written for notification
Wipro Confidential
Page 60 of 91
to take place. In a cluster, once a loss of a resource has been detected, in order to compensate, all systems
are made aware of the loss. This notification must occur even if the network shared by the servers and users
fails. Therefore, a separate private network must be available for inter-server communication. Systems must
have redundant communication methods available. It is important to note that some servers may
continuously monitor each other's ability to communicate. If one server is unavailable to communicate with
the others, the others will assume that the server's resources and services are offline. The servers will notify
each other and failover its services to other servers in the configuration automatically.
Recovery
Once the cluster has responded to the loss of a resource, operators can repair the resource. The cluster should
then be able to restore operations to the state before the failure in a way that is virtually transparent to client
processes.
Data Access Models for Fault Resilient Clusters

Clustered servers must cooperate so as not to interfere with each other's accesses to file system metadata or user
data. There are two basic cluster data access models:
Shared nothing
Shared data
Wipro Confidential
Page 61 of 91
Shared Nothing Clusters
In a shared nothing model, each storage device is connected to exactly one node in the cluster. Storage device
ownership may pass from server to server, but a server must relinquish ownership before another can claim a device.
In the shared nothing cluster model, applications running on different servers cannot access the same file systems
concurrently.
Shared nothing clusters enhance the availability of an application. If an application or the server on which it is
executing fails, a failover server takes control of the application's storage devices, and restarts the application
service.
Shared nothing clusters also enable read-only applications to scale beyond the capacity of a single server. Prior to
the Internet explosion, read-only applications were of limited utility. Currently, however, most commercial web
servers are heavily loaded with read-only data. Multiple instances of a read-only web application can run on shared
nothing clustered servers, each accessing its own copy of served web pages. As long as access is read only, there is
no need to synchronize copies of the web pages.
The storage in a shared nothing cluster is not dual-ported. This storage is often mirrored or uses fault-tolerant,
hardware arrays with redundant controllers. This cluster configuration is relevant only for an application which
features a shared-nothing parallel database architecture. Clusters providing highly available data services, such as
Oracle Parallel Server, require physical connections from all nodes to all storage devices, and cannot be configured
in a shared-nothing manner.
Wipro Confidential
Page 62 of 91
Shared Data Clusters
Shared data clusters enhance application availability, and in addition, enable any partitionable application to scale
beyond the capacity of a single server. Shared data clusters provide read-write access to a single copy of data to
multiple application instances executing on different servers. Since all applications access the same copy, all
applications have instant access to all data updates. There are two different access modes in a shared data cluster:
Shared parallel access
In a this shared data model, storage devices can be accessed by more than one server at the same time. In
the simplest variation of this shared data model, servers share access to storage devices on which they
create private, logical volumes.
Shared disk clusters feature a common I/O bus for disk access. Because all nodes can
write to or cache data from the centralized disks at the same time, a synchronization
mechanism must be used to preserve the coherence of the system. Some sort of lock
manager serves this purpose in a shared disk cluster configuration.
A sophisticated shared data model, such as VERITAS SANPoint Foundation Suite HA,
supports concurrent access to file system data by all servers in a cluster.
Shared exclusive access

This model features storage that is dual-ported. However, rather than concurrent storage access, in this
model, each individual node would have exclusive access to the shared storage at a certain point in time. In
the event of a failure, the faulty server would fail over to the other node. This node would access the same
storage as the original server.
Asymmetric 1 to 1 Configurations
This topic describes fault-resilient, asymmetric, 1 to 1 cluster configurations.
Wipro Confidential
Page 63 of 91
Overview of Asymmetric Failover

In an asymmetric failover configuration, one primary server performs the critical processes, while the secondary
server is either idle or is running a low-priority application. If the primary server fails, the secondary takes
ownership of the shared storage and starts the application. The failover process can also be initiated manually. For
example, manual failover would be used if you wanted to perform maintenance or updates on the primary server.
Once the fault is repaired on the primary server, if the active server fails, the application can fail back to the primary
server.
In this example, a file server application is failed over from the master server to the backup server. Notice that the IP
address used by the client systems moves as well. This is extremely important; otherwise all clients would have to
be updated on each server failover process.
Wipro Confidential
Page 64 of 91
Advantages of Asymmetric Pairs

Asymmetric pairs:
Provide very high data availability.
Are relatively easy to configure.
Disadvantages of Asymmetric Pairs
Asymmetric pairs:
Are expensive.
Involve hardware that is used solely for monitoring purposes.
Make it difficult to get budget approval for idle hardware.
Capacity Considerations for Asymmetric Clusters

In asymmetric configurations, all of the systems might not have equivalent capacities. The node to which
applications are failed over may be a smaller system. Suppose an asymmetric cluster contains three nodes: Node1,
Node2, and Node3. Node2 and Node3 have considerably smaller capacities than Node1
Wipro Confidential
Page 65 of 91
Each node has multiple applications running when all of the nodes are functioning properly. If Node1 fails, App1
fails over to Node2, and App2 and App3 to Node3. App4 and App5 on Node1 are discarded. All local applications
on Node3 will also be discarded to make room for App2 and App3.
Symmetric 1 to 1 Configurations
This topic describes fault resilient, symmetric, 1 to 1 configurations.
Overview of Symmetric Failover

Symmetric failover enables both hosts to run production applications. The hosts then monitor each other through the
dedicated heartbeat networks
Wipro Confidential
Page 66 of 91
In the event of a service failure, the other server would take over and run both applications.
Wipro Confidential
Page 67 of 91
The IP address moves to the host that is running the service. When a failover occurs, the service is failed to the
alternate node, and that node is configured with the new IP address, as well as its old address. This way, client-side
applications do not require reconfiguration to be able to locate the recovered version of the application. Of course,
any TCP connections that were open with the old instance of the service will be terminated by the failover, and new
TCP connections will need to be established. In many cases, the restoration of the TCP connection is transparent to
the user.
Capacity Considerations for Symmetric Clusters

In a symmetric configuration, all of the systems should have equivalent hardware, such as memory, CPU, and I/O
capacities. They can be used simultaneously under normal operating conditions. The administrator must ensure that
sufficient memory, CPU, and I/O capacity are available on the surviving servers in the event that an application is
failed over.
In an example of a symmetric cluster, suppose the cluster contains three nodes: Node1, Node2, and Node3.
Wipro Confidential
Page 68 of 91
Each node has multiple applications running when all of the systems are functioning properly. If Node1 fails, App4
is transferred to Node2, App3 to Node3, and App6 is discarded. Node2 must have enough available capacity during
normal operations to accommodate App4 in the event of a Node1 failure. Similarly, Node3 must have enough
available capacity for App3.
Note that in symmetric failover, the hosts are generally configured with more processing and I/O power than is
needed to run their individual applications. The effect of running both sets of applications on one host must be
considered. If both are running at capacity and one fails, the performance of the remaining one will be poor.
On the surface, it would appear the symmetrical configuration is a far more beneficial configuration in terms of
hardware utilization. Many organizations dislike the concept of a valuable system sitting idle. There is a flaw in this
line of reasoning, however. In asymmetrical failover, the takeover server would need only as much processor power
as its peer. On failover, performance would remain the same. In symmetrical failover, the takeover server would
need sufficient processor power to not only run the existing application, but also enough for the new application it
takes over. If a single application needs one processor to run properly, an asymmetric configuration would need two
single processor systems. To run identical applications on each server, a symmetrical configuration would require
two dual processor systems.
N to 1 Clustering
This topic describes a traditional N to 1 networked cluster configuration.
N to 1 Cluster Scalability
One important consideration in clustering is scalability. Most HA packages can scale to eight or more nodes. It is
important to note that attaching more than two hosts to a single SCSI storage device becomes problematic, as
specialized cabling must be used. In most cases, scaling beyond four hosts is not practical, as it severely limits the
actual number of SCSI disks that can be placed on the bus.
Wipro Confidential
Page 69 of 91
Example of a 4 to 1 Cluster
This example illustrates the inherent complexities of a 4 to 1 cluster. Each of the four primary servers are connected
to a set of two disks. All the disks are connected to a fifth server that acts as the backup server. This could be
asymmetric or symmetric cluster. The major difference is in the functionality of the backup server:
In a 4 to 1 asymmetric configuration the fifth server would simply act as the standby server. The four
primary servers act independently of one another. In the event of a single server failure, its services would
be failed over to the standby server.
In a 4 to 1 symmetric configuration, the fifth server would act as the standby server and also run
applications.
N to 1 Clustering on the SCSI Bus

An initiator, as the name implies, initiates commands. SCSI host bus adapters (HBAs) are the initiators. SCSI drives
are targets. On a SCSI bus, N to 1 data sharing requires multi-initiator attachment. Multi-initiator attachment
requires the capability to change the SCSI target ID of the HBA since only one HBA on the bus can have the highest
priority ID of the seven. It also requires special support in the driver to release control of the bus to another initiator.
Wipro Confidential
Page 70 of 91
This diagram shows how a cluster of systems might share a group of disks. Notice that each of the HBAs on the bus
must have a high-priority, but different SCSI target IDs. Special cables must be used to attach more than two hosts to
the bus.
Disadvantages of this configuration include:
The potential for duplicate IDs
Complicated termination issues that can result in the loss of data
Compatibility between controllers (For example, you must have differential SCSI devices if you have a
differential controller.)
Dual hosted SCSI

Dual hosted SCSI has existed for a number of years and functions well in small cluster configurations. The primary
limitation of dual hosted SCSI is scalability. Typically, two (to a maximum of four) systems can be connected to a
single drive array. Large storage vendors, such as EMC, provide high-end arrays with multiple SCSI connections in
order to overcome scalability issues. In most cases, the nodes are connected to a simple array in a configuration
illustrated in this diagram.
A typical SCSI bus has one SCSI initiator for the controller or HBA, and one or more SCSI targets for the drives. To
configure a dual hosted SCSI configuration, one SCSI initiator ID must be set to a value different than its peer. The
SCSI target IDs must be chosen so they do not conflict with the ID for any drive that is installed or an initiator ID.
Setting the SCSI Initiator ID
The method of setting SCSI initiator IDs are dependent on the system manufacturer. For example, Sun
Microsystems provides two methods to set SCSI initiator IDs:
Changing the scsi-initiator-id value
This affects all SCSI controllers in the system, including the internal controller for the system disk and CDROM. Be careful when choosing a new controller ID to not conflict with the boot disk, floppy drive, or
CD-ROM.
Editing the SCSI driver control file
This file is in the /kernel/drv area. This will set the SCSI initiator ID on a per controller basis. NT and Intel
systems are typically set on a per controller basis with a utility package provided by the SCSI controller
manufacturer. You should refer to your system documentation for details.
Wipro Confidential
Page 71 of 91
Common Problems in Dual Hosted SCSI

The most common problems that are encountered when attempting to configure dual hosted SCSI are:
Duplicate SCSI Target IDs
The most common problem encountered when configuring shared SCSI storage is duplicate SCSI target
IDs. A duplicate SCSI target ID will, in many cases, exhibit different symptoms depending on whether
there are duplicate controller IDs, or a controller ID conflicting with a disk drive.
Duplicate Initiator IDs
This is a very serious problem that is more difficult to identify than duplicate SCSI target IDs. In a normal
communication sequence, a target can only respond to a command from an initiator. If an initiator sees a
command from an initiator, it will be ignored.
The problem may manifest itself during simultaneous commands from both initiators. A
controller could issue a command, and see a response from a drive and assume all is well.
This command may actually have been from the peer system. The original command may
have not executed successfully. Carefully examine systems attached to shared SCSI and
make certain that the controller ID is different.
Configuring Dual Hosted SCSI: Example
The following is an example of how to set up a typical dual hosted SCSI configuration:
1. Attach the storage to one system.
2. Terminate the SCSI bus at the array.
3. Power up the host system and array.
4. Verify all drives can be seen with the operating system by using available commands such as the format
command.
5. Identify the SCSI drive IDs that are used in the array and the internal SCSI drives if they are present.
6. Identify the SCSI controller ID.
7. Identify a suitable ID for the controller on the second system.
This ID must not conflict with any drive in the array or the peer controller. If you plan to set all controllers
to a new ID, ensure that the controller ID chosen on the second system does not conflict with internal SCSI
devices.
8. Set the new SCSI controller ID on the second system.
9. Power down both systems and the external array.
10. Disconnect the SCSI terminator and connect the array to the second system.
11. Power up the array and both systems.
Depending on hardware platform, you may be able to check for array connectivity before the OS is brought
up. Boot console messages such as "unexpected SCSI reset" are a normal occurrence during the boot
sequence of a system connected to a shared array. Most SCSI adapters will perform a bus reset during
initialization. The error message is generated when it sees a reset that was initiated by the peer.
N to 1 SAN Clustering
This topic describes the implementation of an N to 1 clustering design in a Storage Area Network (SAN)
environment. SANs are specialized high-speed networks that enable fast, reliable access among computers and
independent storage resources. In a SAN, all networked servers share storage devices as peer resources. In other
words, they are not the exclusive property of any one server. You can use a SAN to connect servers to storage,
servers to each other, and storage to storage through hubs, switches, and routers.
Wipro Confidential
Page 72 of 91
Defining SAN
SANs are defined as specialized, high-speed networks that are specifically dedicated to storage. SANs provide fast,
reliable access among systems and storage resources. The Storage Networking Industry Association (SNIA) defines
a SAN as:
"A network whose primary purpose is the transfer of data between computer systems and
storage elements and among storage elements. Abbreviated SAN. A SAN consists of a
communication infrastructure, which provides physical connections, and a management
layer, which organizes the connections, storage elements, and computer systems so that
data transfer is secure and robust."
Fibre Channel
Although the definition of a SAN does not specifically mention Fibre Channel technology, the Fibre Channel
protocol was the foundation for the development of SAN technology. With the emergence in the mid-1990s of Fibre
Channel-based networking devices, such as Fibre Channel switches, companies began to create networked
environments for storage in which servers and storage were connected in an any-to-any fashion, supported by a
highly reliable, high-performance fabric network. Fibre Channel, for the first time, enabled companies to virtualize
storage and provide high-speed access to information from any storage device to any server.
SAN Benefits
Attaching more than two hosts to a traditional, single SCSI storage device becomes problematic. SANs enable you
to connect a large number of hosts to a nearly unlimited amount of storage. This allows much larger clusters to be
constructed relatively easily.
A SAN carries only I/O traffic between servers and storage devices. It does not carry general-purpose traffic such as
email or other end user applications. Therefore, it avoids the compromises inherent in using a single network for all
applications. With this shared capacity, organizations can acquire, deploy, and use storage devices more costeffectively. Ultimately, on a SAN, any data at any network location is accessible, often through multiple paths, by
any nodes, applications, or users on the network. Storage on a SAN is shared, resulting in centralized management,
better utilization of disk and tape resources, and enhanced enterprise-wide data management and protection.
Wipro Confidential
Page 73 of 91
SANs are designed to replace today's point-to-point access methods with a new any-to-any architecture. In the
traditional model, if disks are logically shared, this sharing occurs at LAN speeds, such as 100 megabits/second, or
is limited to the small number of nodes which can be directly attached to a given disk array. Through the addition of
a high-speed switch, clients can access any disk from any node on the SAN at channel speeds, such as 100MB/sec.
This allows a much larger number of nodes much faster access to a much larger centralized data store.
Wipro Confidential
Page 74 of 91
Redundancy is easily added to a SAN through the incorporation of a second switch or redundant switching
components to support high availability data access. Additional nodes and disk arrays can be easily added to these
configurations with minimal disruption by plugging new components into the switch, providing a much simpler and
more scalable growth path than traditional architectures. Finally, any node in the SAN may potentially back up any
other node. One or two dedicated nodes can now backup a much greater number of nodes, thereby significantly
reducing the hardware costs associated with cluster configurations.
Wipro Confidential
Page 75 of 91
Failover Granularity in Clusters

This topic describes the concepts and requirements of application-level failover, as opposed to server-level failover.
First Generation Failover Granularity

A significant limitation of first generation failover management systems is failover granularity. Failover granularity
refers to what it is that must failover on the event of a failure. First generation systems had failover granularity equal
to a server. This means that in the event of the failure of any HA application on a system, all applications would fail
to a second system. This severely limited the scalability of any server. For example, running multiple production
Oracle instances on a single system is problematic. The failure of any instance will cause an outage of all the
instances on the system while all applications are migrated to another server.
Second Generation Failover Granularity

One of the distinguishing features of a second generation HA systems is the concept of resource groups, or service
groups. Particularly on the large servers, it is rare that the entire server will be dedicated to a single application
service. Configuring multiple domains on an enterprise server partially alleviates the problem, however multiple
applications may still run within each domain. Failures that affect a single application service, such as a software
failure or hang, do not necessarily affect other application services that may reside on the same physical host or
domain. If they do, then downtime may be unnecessarily incurred for the other application services.
Application Services
An application service is the service the end user perceives when accessing a particular network address. An
application service is typically composed of multiple resources, some hardware and some software based, all
cooperating together to produce a single service.
Wipro Confidential
Page 76 of 91
Example of an Application Service

For example, a database service may be composed of one or more logical network IP addresses, RDBMS software,
an underlying file system, a logical volume manager, and a set of physical disks that are being managed by
VERITAS Volume Manager. If this database service needed to be migrated to another node for recovery purposes,
all of its resources must migrate together to re-create the service on another node.
A single large node may host any number of application services, each providing a discrete service to networked
clients who may or may not know that they physically reside on a single node.
Application Service Management
Application services can be proactively managed to maintain service availability through an intelligent availability
management tool. An application service can be made highly available if it is possible to test the application service
to ensure that it is providing the expected service to networked clients and you can automatically start and stop the
application service.
If multiple application services are running on a single node, then they must be monitored and managed
independently. Independent management allows an application service to be automatically recovered or manually
idled for administrative or maintenance reasons without necessarily impacting any of the other applications running
on a node. This is particularly important on larger servers, which may easily be running eight or more applications
concurrently. Of course, if the entire server crashes, as opposed to just a software failure or hang, then all the
application services on that node must be recovered elsewhere.
At the most basic level, the fault management process includes monitoring an application service and, when a failure
is detected, restarting that application service automatically. This could mean restarting it locally or moving it to
another node and then restarting it, as determined by the type of failure incurred. In the case of local restart in
response to a fault, the entire application service does not necessarily need to be restarted; perhaps just a single
resource within that application service may need to be restarted to restore the service.
Load Balancing
Given that application services can be independently manipulated, a failed node's workload can be load balanced
across remaining cluster nodes, and potentially failed over successive times without manual intervention. In this
example, a three node cluster is operating normally while running four applications.
The second node fails. On recovery, the application load of the failed server is balanced across the other two nodes
Wipro Confidential
Page 77 of 91
If another server fails, all of the applications would failover to the remaining server.
Application Requirements for Failover

Nearly all applications can be placed under cluster control, as long as basic guidelines are met:
The application must have a defined procedure for startup
This means that the failure management software can determine the exact command used to start the
application, as well as all other outside requirements the application may have, such as mounted file
systems, IP addresses, etc. For example, an Oracle database agent needs the Oracle user, Instance ID,
Oracle home directory, and the pfile. The developer must also know implicitly what disk groups, volumes,
and file systems must be present.
The application must have a defined procedure for stopping.
This means that an individual instance of an application must be capable of being stopped without affecting
other instances. For example, using a Web server, killing all HTTPD processes is unacceptable since it
would stop other Web servers as well.
The application must have a defined procedure for monitoring the overall health of an individual instance.
Using the Web server as an example, simply checking the process table for the existence of "httpd" is
unacceptable, as any Web server would cause the monitor to return an online value. Checking if the pid
contained in the pid file is actually in the process table would be a better solution.
To add more robust monitoring, an application can be monitored from closer to the user perspective.
For example, an HTTPD server can be monitored by connecting to the correct IP address and port and
testing if the Web server responds to http commands.
In a database environment, the monitoring application can connect to the database server
and perform SQL commands and verify read and write to the database. It is important
Wipro Confidential
Page 78 of 91
that data written for subsequent read-back is changed each time to prevent caching from
hiding underlying problems.
In both cases, end-to-end monitoring is a far more robust check of application health. The
closer a test comes to exactly what a user does, the better the test is in discovering
problems. This does come at a price. End to end monitoring increases system load and
may increase system response time. From a design perspective, the level of monitoring
implemented should be a careful balance between assuring the application is up and
minimizing monitor overhead.
The application must be capable of storing all required data on shared disks.
This may require specific setup options or even soft links. For example, the VERITAS NetBackup product
is designed to install in /usr/openv directory only. This requires either linking /usr/openv to a file
system mounted from the shared storage device or actually mounting file system from the shared device on
/usr/openv. Similarly, the application must store data to disk, rather than maintaining in memory. The
takeover system must be capable of accessing all required information.
The application must be capable of being restarted to a known state.
This is the most important application requirement. On a switchover, the application is brought down under
controlled conditions and started on another node. The application must close out all tasks, store data
properly on shared disk, and exit. At this time, the peer system can startup from a clean state. A problem
arises when one server crashes and another must take over. The application must be written in such a way
that data is not stored in memory, but regularly written to disk.
A commercial database such as Oracle, is the perfect example of a well written, crash tolerant application.
On any given client SQL request, the client is responsible for holding the request until it receives an
acknowledgement from the server. When the server receives a request, it is placed in a special log file, or
"redo" file. This data is confirmed as being written to stable disk storage before acknowledging the client.
At a later time, Oracle then de-stages the data from redo log to actual table space. After a server crash,
Oracle can recover to the last known committed state by mounting the data tables and applying the redo
logs. This in effect brings the database to the exact point of time of the crash. The client resubmits any
outstanding client requests not acknowledged by the server; all others are contained in the redo logs.
One key factor to note is the cooperation between client application and server. This must be factored in
when assessing the overall cluster compatibility of an application.
The application must be capable of running on all servers designated as potential hosts.
This means there are no license issues, host name dependencies, or other such problems. Prior to attempting
to bring an application under cluster control, it is highly advised the application be test run on all systems in
the proposed cluster that may be configured to host the application.
Wipro Confidential
Page 79 of 91
Example: Configuring Veritas Cluster on Solaris 2.8 with VxFS and

Volume manager
Setup : 1. 2*E450's with 2Gb RAM and 2*480 MHz CPU.
2. 2* A1000 Raid Box with 4*18.1 Gb HDD and raid manager rm6 6.22
3. One onboard card (hme0), i additional card (hme1) and gigabit
Ethernet
card (ge0)
4. A cross cable and a fiber cable for heart beat.
Procedure :
1. Load Solaris 2.8 on both the machines along with latest patches.
2. Load the rm6 utility for configuring A1000.There are 4 hard disks in each array,
configured
Them as Raid 0 .So each logical Volume will appear as 72Gb Hard disk in Volume
manager.
3. Now in the cluster the 2 A1000 boxes will be used as shared disk.
4. Before connecting the A1000's to both the machines, dont forget to change the scsiinitiator-id
Of one machine (preferably this machine will be your secondary server in the cluster)
Now put the differential card in slot no.1 and another in slot no.5 (from above). Do this
on both the servers. Otherwise the server will continuously give errors --> AUTO
SENSE RESET FAILED.
5.after doing this activity connect each E450 to both the arrays can verify the hard disks
Using format command.
6.Load the required array patches.
7.Load the Veritas Volume 3.1
The veritas Volume manager 3.1 has in built VxFS 3.3.3. The following packages
are used for
VxFS
a. VRTSvxfs
b. VRTSqio
c. VRTSqlog.
Remember to put VRTSvxfs package first and then put the packages (b) and (c),
otherwise the
Wipro Confidential
Page 80 of 91
Remaining two packages will give errors while installing.

After loading the Volume manager 3.1, load the required volume manager and VxFS
patches. Include the Volume Manager as well as VxFS license using vxlicense -c
command.
8. On both the servers include both the internal hard disks in the rootdg volume group.
Mirror the internal disks thro Volume Manager.
9. Assume the hostname of primary server is dotsoft1 and that of the secondary server
is dotsoft2. On dotsoft1 create additional diskgroup called bsnldg and include the array
hard disks (configured as Raid 0) in that disk group. Mirror the hard disks included in
bsnldg diskgroup from the volume manager.
10. Now check the Major and Minor numbers on both primary and secondary servers
in /dev/dsk directory after confirming that Array volumes are detected and mounted on
both servers. Also check the values of vxio and vxspec in /etc/name_to_major on both
the servers. If the values are different, change the values preferably on secondary
server. See to it that the values are same; otherwise you will face problems later during
clustering.
11. Load the gigabit Ethernet driver from the cd, which comes with Solaris 8 pack.
12. For dotsoft1 hme0 - 192.168.33.6
dotsoft2 hme0 - 192.168.33.7
13. DO NOT PLUMB THE PRIVATE CARDS ( hme1 and ge0) as it will been take care
by Veritas Cluster software. Also please remove the entries of all mount points (of the
shared array) from the /etc/vfstab file.
14. Before proceeding further pl. write down the resources you want to put under VCS
and will failover on other live system in cluster. Eg: NIC, Disk Partition, IP, Mount points,
etc.
15. VCS uses 2 components LLT and GAB to share data over private networks among
systems. LLT provided fast kernel-to-kernel communication and monitors network
connections. LLT is configured using llttab file. This file describes the system in the
cluster and private network links among
them. The GAB ( group membership and atomic broadcast) provides global message
order required to provide synchronized state among the systems and monitors disk
communications such as that required by VCS heartbeat utility.GAB driver is configured
by creating gabtab file.
16. Mount the VCS cd and runt the command #./InstallVCS
Wipro Confidential
Page 81 of 91
Enter the following info ..

Please enter unique cluster name : bsnlcluster
Please enter unique cluster id : 2 (has to be unique if multiple cluster setups are there in
the organization)
Enter the systems in which u want to install VCS : dotsoft1 dotsoft2
(the name should be separated by spaces)
after this it will start installing software on both the machines. The process discovers
information
about the network cards.
It will discover all the network card in the system.
It will prompt for entering device files for private link.
for eg: for 2nd hme card enter /dev/hme1
for qfe cards : /dev/qfe1 or 2
for gigabit Ethernet cards select --others...when u select "other" option it will prompt u to
enter
the actual device file. So for ge card enter the following : /dev/ge0
In our case hme1 and ge0 card were used for private link.
so the device files will be --/dev/ge0 and /dev/hme1
17. The same info will be asked for the other server. So enter the same device files for
the other server.
18. Reboot the servers.
19.To verify whether the installation is successful check the following files
a. /etc/llthosts -->it should contain the entry for both the servers.
b. /etc/llttab --> cluster node id along with private links info.
c. /etc/gabtab --> should contain the gabconfig command
d. /etc/VRTSvcs/conf/config/main.cf--> the main configuration file for configuring
veritas
cluster .The entries have to be made in this file (either from command line mode or GUI
mode)
THE ./InstallVCS script creates a user 'admin' with password as 'password' required for
veritas cluster administration through GUI. It is called veritas cluster manager.
19. Before starting the configuration set the path variable as
Wipro Confidential
Page 82 of 91
PATH=$PATH:/sbin:/usr/sbin:/opt/VRTSvcs/bin ; export PATH

20.check the private link status using the command
#lltstat -nvv | more
21. to verify gab is operating use the command
/sbin/gabconfig -a
this command should return gab port membership info
22. to verity whether cluster is operating use the following command
# hastatus -summary
23.to install the veritas cluster manager GUI add the following package from the cluster
cd
--> VRTScscm
24.the core VCS processes, which should be running, are
a.had --> VCS engine that maintains configuration info administers direct fail
over.
b.hashadow -->process that monitors and starts VCS engine
c.halink--> process that monitors link between systems in the cluster.
25. Following are the three components of a veritas cluster-->
a. Resources--> can be hardware or software entities like hard disk , NIC , ip
address , application , database
b. Resource Type -> resources is classified into types. like multiple resources can
be of
one type. for eg: 2 volumes can be classified as of type VOLUME.
c. Service Group--> most important component of VCS. Its composed of related
resources. when a service group os brought online , all the resources within a service
group are automatically brought online. the fail over service group is a service group
which can be brought online only on 1 system at a time. eg : file systems are configured
as fail over service groups .
Wipro Confidential
Page 83 of 91
A parallel service group can be fully or partially online on both the servers at a time for
eg :
OPS is configured as parallel service group.
26. In veritas cluster dependencies between the resources has to be created.the
dependencies between the resources specify the order in which the resources within a
service group are brought online and taken offline.
for eg if a servicegroup called abc is created which has diskgroup and volumes as
resources.so when a servicegroup is brought online then diskgroup is brought online
first and then the volume. So dependency between diskgroup and volume has to be
created. Same way the dependency between the volume and the mount point as top be
created.
Since diskgroup comes up first , it is called the child and volume is called the parent.
Sameway between the volume and the mount point , volume is the child and the mount
point is the parent. the same holds true between a NIC and an ip address.
27.before starting the VCS gui the database has to be made r/w.use the following
command to make it
r/w
# haconf -makerw -- to set in read/write mode
# hauser -add username -- to add another user
# haconf -dump -makero --reset the configuration to read only
# xhost +
# hagui &
it will open a console . Enter the user as admin or the user u just created
Wipro Confidential
Page 84 of 91
28 . To create dependencies
a. create a service group called bsnl.Include the diskgroup resource in it by
selecting the Add Resource tab.The diskgroup "bsnldg" is already created using veritas
volume manager.
click on the properties of resource diskgroup and enter its properties like diskgroup
name.
b. create resource called volume by selecting Add resource.include all the 10
volumes in it which are being created using veritas volume manager.in the properties
tab of each volume resource , enter the volume name.
(same way create 10 mount points after creating a resource called mount.specify the
mount point name , the physical device to which the mount point corresponds to , and
the volume name to which the mount point corresponds from the properties tab)
c. create the dependency between the diskgroup and the volume.Volume will be
the parent and the diskgroup will be the child.while
Wipro Confidential
Page 85 of 91
creating the dependency , actually a link has to be created between the volume and the
diskgroup. ( by dragging the mouse)
for admin , the password will be password. Also enter the cluster name.
The screen looks as follows :
As shown .. Exch_NIC represents the NIC card resource and Exch_IP represents
resource Ip. Exch_NIC is the child whereas Exch_IP is the master. Same way
Exch_DiskResis the child and Exch_MountX is the parent.
So to create a link ( between 2 blue boxes) , drag the mouse from one object to
another. It will ask whether Exch_DiskRes is the child and Exch_MountX is the parent.
In the same way create links or dependencies between all the objects.
In the above diag. VCSNT5 and VCSNT6 are 2 systems in the cluster.
Once these dependencies are created , start the cluster services on the primary
server.Hence all the volumes in the shared array will get mounted.
On giving up the haswitch command all the volumes will get mounted on the secondary
server
Wipro Confidential
Page 86 of 91
To check which commands are being executed in the background click the "command
center" icon.
Wipro Confidential
Page 87 of 91
Configuring Membership Heartbeat Regions on Disk

Group membership heartbeat regions can also be setup on shared disks/array for use
as an additional path for VCS heartbeating.With these regions configured , in addition to
network connections VCS has multiple heartbeat paths available.so if one private link
fails , VCS has another network connection and heart beat disk region that continue
heartbeating.for this 2 regions each of 128 blocks need to be configured
But this region cannot be configured on Vxvm volumes but on the block ranges of the
underlying physical device.
So if the shared disk is under volume manager control , following steps need to be
followed.
a. bring the shared disk under volume manager control.
Say the volume disk name is c3t5d0
b. unmount all file systems on the disk
c. remove all the volumes. ( hence create this region before creating volumes)
d. remove the disk from volume manager control.
e. Give the following command :
# hahbsetup c3t5d0
it will give u the following output :
The hadiskhb command is used to set up a disk for combined use
by VERITAS Volume Manager and VERITAS Cluster Server for disk
communication.
WARNING: This utility will destroy all data on c3t5d0
Have all disk groups and file systems on disk c1t1d0 been either
unmounted or deported? y
There are currently slices in use on disk /dev/dsk/c3t5d0s2
Destroy existing data and reinitialize disk? y
1520 blocks are available for VxCS disk communication and
service group heartbeat regions on device /dev/dsk/c3t5d0s7
This disk can now be configured into a Volume Manager disk
group. Using vxdiskadm, allow it to be configured into the disk
group as a replacement disk. Do not select reinitialization of
the disk.
After running vxdiskadm, consult the output of prtvtoc to
confirm the existence of slice 7. Reinitializing the disk
under VxVM will delete slice 7. If this happens, deport the disk
group and rerun hahbsetup.
f. On running the format command for c3t5d0 disk , u will observe that 2 Mb space is
created in s7 slice.
Wipro Confidential
Page 88 of 91
g. Now readd the disk under volume manager control using the option "remove the disk
after replacement " option from the vxdiskadm option
(PLEASE DO NOT REINITIALIZE THE DISK)
After Dependency for resources is created try to online the group by following
procedure.
1) Select the group you want to online/offline/switch-to in "Services group"
2) Right click on the group. (if group is offline, you will get tabs for bringing it
online & if the group is already online, tabs for offline and switch to is available)
3) Check with the operations of online and offline on local system as well as
remote system.
If up to this step every thing goes through we are ready for switch over of the group.
4) Right click on the group from the "Services group" click switch to and click
on the remote system name in the cluster. (This will offline the group on local
system and online the same on the remote system , But before Switch-to Group
should be online on Local System.
5) Check with the help of ifconfig -a and df -k commands on both systems
whether the mount points and IP is transferred from local to remote system.
6) If all the mount points and IP address configured in VCS is switched over &
come up online successfully on remote system , now go ahead for directly
switching off the system on which the group is presently online.
ORACLE AGENT FOR ORACLE DATABASE :
The Oracle agent monitors the Oracle service and the SQLnet listener process
1. The oracle agent works in 3 modes
a. ONLINE : uses svrmgrl command to open the database
b. OFFLINE : uses svrmgrl command close the database (shutdown immediate.
c. MONITOR : scans process table for ora_pmon , ora_smon and ora_lqwr.
2. The SQLnet Listener process does the following :
a. ONLINE : uses lsnrctl -start to start the listener process
b. OFFLINE: uses lsnrctl -stop to stop listener process
c. MONITOR: scans process table for tnslsnr $LISTENER
Requirements for Oracle Agent :
When Oracle server application ($ORACLE_HOME) is installed on shared disk , each
cluster system must have same mount point directory for shared file system.
To install the Oracle agent :
Wipro Confidential
Page 89 of 91
# cd /cdrom/cdrom0
# pkgadd -d.
Now start the cluster manager GUI and import OracleTypes.cf file to the VCS engine
using following method
a. Start cluster manager GUI
b. Click on file menu and select import files
c. In import files dialog box select the file
/etc/VRTSvcs/conf/sample_Oracle/OracleTypes.cf
d. save the configuration using filesave option .
This will put Oracle as a resource in the Cluster.
(by default diskgroup , mount points , volumes , NIC card , IP , disks are present as
resources. But since oracle is not present , so this method has to be used to make
oracle as resource available for the cluster.
After installing the Oracle agent , when you open the Cluster Manager GUI ,
"Oracle" will be present as resource.So when u create a new resource of type Oracle , it
will ask for following information :
a. sid
b. owner
c. $ORACLE_HOME path
d. Pfile -> $ORACLE_HOME/dbs/initSID.ora value_name of startup profile.
Similarly a resource of type SQLnet will be available. So add the resource of the type
SQLnet and enter the following information :
a.
b.
c.
d.
owner
$ORACLE_HOME path to oracle binaries
name of listener. (default is LISTENER)
$TNS_ADMIN path to directory in
resides( listener.ora)
which
listener
configuration
file
Now to create the dependencies between these 2 resources (oracle and SQLnet)
Also assign a demo ip which will float from one system to another system incase of
system failover.so IP will be the parent and the public NIC card can be the child.This
demo Ip will act as child to Oracle agent which will be the parent to demo Ip.
So oracle will be the client and SQLnet will be the parent
So the final dependency looks this way :
( from left to right i.e. from child to parent )
Wipro Confidential
Page 90 of 91
(also oracle agent is parent to demo Ip, which is the parent to NIC)
diskgroup-volumes-mountpoints-Oracle agent-Sqlnet
|
|
|
|
demo IP
|
|
|
NIC card (hme0:1)
So in case the system fails , the Sqlnet services will stop first (parent goes offline first ) ,
then oracle will shutdown , mnt points will get unmounted ,
Volumes will go offline and diskgroup will automatically deport.
Also at the same time demo IP will go offline.
Now since child comes online first , So on the other system the demo Ip will come up ,
diskgroup will automatically get imported , volumes will then come online , mnt points
will get mounted, database will come up and finally listener service will start
successfully.
Now u can switch off the machine and check whether the Oracle database comes up on
the other system in the cluster.
IMP COMMANDS
1.hastart --start the VCS engine
2.hagrp -display -- to display service groups
3.hastatus -summary-- summary of cluster info.
U can also check cluster which u have configured thro GUI form the main.cf file present
in the path /etc/VRTSvcs/conf/config
Wipro Confidential
Page 91 of 91

Vcs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vcs

Uploaded by

Copyright:

Available Formats

Veritas Cluster Server

Learning Document for HA Concepts &

Veritas Cluster Server

VERITAS Cluster Server

The Industry's Most Scalable Availability Management Solution

Veritas Cluster Server

VERITAS SANPoint Foundation Suite HA

Features and Benefits of SPFSHA

Veritas Cluster Server

Veritas Cluster Server

What is High Availability?

Veritas Cluster Server

The Rule of the Nines

Downtime Per Week

Veritas Cluster Server

The High Availability Equation

Veritas Cluster Server

A basic availability environment requires no specific planning for downtime. Backups

This level of availability is achieved by employing RAID (redundant array of

Veritas Cluster Server

Veritas Cluster Server

Veritas Cluster Server

The Need For High Availability

Veritas Cluster Server

The Costs of Downtime

Types of Faults and Failures

Veritas Cluster Server

Types of Possible Nonreproducible Failures

Veritas Cluster Server

Veritas Cluster Server

Tape device failure

Power supply failure

Veritas Cluster Server

Network Interface Card (NIC) failure

Veritas Cluster Server

Human Error or Acts of Terrorism

High Availability vs. Disaster Planning

Veritas Cluster Server

Comparing Disaster Recovery and HA

Veritas Cluster Server

Disaster Planning Using Storage Level Data Replication

High Availability vs. Fault Tolerance

Defining Fault Tolerance

Characteristics of a Fault Tolerant System

Veritas Cluster Server

Fault Tolerance Processes

High Availability Planning

Guidelines When Planing an HA Implementation

Veritas Cluster Server

Understand the recovery point and recovery time.

Protect appropriate system components.

Veritas Cluster Server

Veritas Cluster Server

Account for future growth.

This documentation can be in a variety of formats:

Select the appropriate software and hardware.

Veritas Cluster Server

Do not overcomplicate the system.

Veritas Cluster Server

Balance the cost of the availability solution with the rewards.

The Layered Approach to High Availability

Veritas Cluster Server

Veritas Cluster Server

Storage Management Layer