You are on page 1of 11

winadmins.wordpress.com http://winadmins.wordpress.

com/tag/clustering/
Failover Clustering in Windows Server 2008 R2 Part 1
Introduction
A f ailover cluster is a group of independent computers that work together to increase the availability of
applications and services. The clustered servers (called nodes) are connected by physical cables and by
sof tware. If one of the cluster nodes f ails, another node begins to provide service (a process known as
f ailover). Users experience a minimum of disruptions in service.
Windows Server Failover Clustering (WSFC) is a f eature that can help ensure that an organizations critical
applications and services, such as e-mail, databases, or line-of -business applications, are available whenever
they are needed. Clustering can help build redundancy into an inf rastructure and eliminate single points of
f ailure. This, in turn, helps reduce downtime, guards against data loss, and increases the return on investment.
Failover clusters provide support f or mission-critical applicationssuch as databases, messaging systems,
f ile and print services, and virtualized workloadsthat require high availability, scalability, and reliability.
What is a Cluster?
Cluster is a group of machines acting as a single entity to provide resources and services to the network. In
time of f ailure, a f ailover will occur to a system in that group that will maintain availability of those resources to
the network.
How Failover Clust ers Work?
A f ailover cluster is a group of independent computers, or nodes, that are physically connected by a local-area
network (LAN) or a wide-area network (WAN) and that are programmatically connected by cluster sof tware. The
group of nodes is managed as a single system and shares a common namespace. The group usually includes
multiple network connections and data storage connected to the nodes via storage area networks (SANs). The
f ailover cluster operates by moving resources between nodes to provide service if system components f ail.
Normally, if a server that is running a particular application crashes, the application will be unavailable until the
server is f ixed. Failover clustering addresses this situation by detecting hardware or sof tware f aults and
immediately restarting the application on another node without requiring administrative interventiona process
known as f ailover. Users can continue to access the service and may be completely unaware that it is now
being provided f rom a dif f erent server
Figure . Failover clustering
Failover Clust ering Terminology
1. Failover and Failback Clustering
Failover is the act of another server in the cluster group taking over
where the f ailed server lef t of f . An example of a f ailover system can
be seen in below Figure. If you have a two-node cluster f or f ile access and one f ails, the service will f ailover to
another server in the cluster. Failback is the capability of the f ailed server to come back online and take the
load back f rom the node the original server f ailed over to.
2. Active/Passive cluster model:
Active/Passive is def ined as a cluster group where one server is
handling the entire load and, in case of f ailure and disaster, a Passive
node is standing by waiting f or f ailover.
One node in the f ailover cluster typically sits idle until a f ailover
occurs. Af ter a f ailover, this passive node becomes active and
provides services to clients. Because it was passive, it presumably
has enough capacity to serve the f ailed-over application without
perf ormance degradation.
3. Active/Active failover cluster model
All nodes in the f ailover cluster are f unctioning and serving clients. If a
node f ails, the resource will move to another node and continue to
f unction normally, assuming that the new server has enough capacity
to handle the additional workload.
4. Resource. A hardware or sof tware component in a f ailover cluster
(such as a disk, an IP address, or a network name).
5. Resource group.
A combination of resources that are managed as a unit of f ailover.
Resource groups are logical collections of cluster resources. Typically
a resource group is made up of logically related resources such as
applications and their associated peripherals and data. However,
resource groups can contain cluster entities that are related only by
administrative needs, such as an administrative collection of virtual
server names and IP addresses. A resource group can be owned by
only one node at a time and individual resources within a group must
exist on the node that currently owns the group. At any given instance,
dif f erent servers in the cluster cannot own dif f erent resources in the
same resource group.
6. Dependency. An alliance between two or more resources in the
cluster architecture.
7. Heartbeat.
The clusters health-monitoring mechanism between cluster nodes.
This health checking allows nodes to detect f ailures of other servers
in the f ailover cluster by sending packets to each others network
interf aces. The heartbeat exchange enables each node to check the
availability of other nodes and their applications. If a server f ails to
respond to a heartbeat exchange, the surviving servers initiate
f ailover processes including ownership arbitration f or resources and
applications owned by the f ailed server.
The heartbeat is simply packets sent f rom the Passive node to the Active node. When the Passive node
doesnt see the Active node anymore, it comes up online
8. Membership. The orderly addition and removal of nodes to and
f rom the cluster.
9. Global update. The propagation of cluster conf iguration changes
to all cluster members.
10. Cluster registry. The cluster database, stored on each node and
on the quorum resource, maintains conf iguration inf ormation
(including resources and parameters) f or each member of the cluster.
11. Virtual server.
A combination of conf iguration inf ormation and cluster resources, such as an IP address, a network name, and
application resources.
Applications and services running on a server cluster can be exposed to users and workstations as virtual
servers. To users and clients, connecting to an application or service running as a clustered virtual server
appears to be the same process as connecting to a single, physical server. In f act, the connection to a virtual
server can be hosted by any node in the cluster. The user or client application will not know which node is
actually hosting the virtual server.
12. Shared storage.
All nodes in the f ailover cluster must be able to access data on
shared storage. The highly available workloads write their data to this
shared storage. Theref ore, if a node f ails, when the resource is
restarted on another node, the new node can read the same data
f rom the shared storage that the previous node was accessing.
Shared storage can be created with iSCSI, Serial Attached SCSI, or
Fibre Channel, provided that it supports persistent reservations.
13. LUN
LUN stands f or Logical Unit Number. A LUN is used to identif y a disk or
a disk volume that is presented to a host server or multiple hosts by a
shared storage array or a SAN. LUNs provided by shared storage
arrays and SANs must meet many requirements bef ore they can be
used with f ailover clusters but when they do, all active nodes in the
cluster must have exclusive access to these LUNs.
Storage volumes or logical unit numbers (LUNs) exposed to the nodes
in a cluster must not be exposed to other servers, including servers in
another cluster. The f ollowing diagram illustrates this.
14. Services and Applications group
Cluster resources are contained within a cluster in a logical set called
a Services and Applications group or historically ref erred to as a
cluster group. Services and Applications groups are the units of
f ailover within the cluster. When a cluster resource f ails and cannot be
restarted automatically, the Services and Applications group this
resource is a part of will be taken of f line, moved to another node in
the cluster, and the group will be brought back online.
15. Quorum
The cluster quorum maintains the def initive cluster conf iguration data
and the current state of each node, each Services and Applications
group, and each resource and network in the cluster. Furthermore,
when each node reads the quorum data, depending on the inf ormation
retrieved, the node determines if it should remain available, shut down
the cluster, or activate any particular Services and Applications groups
on the local node. To extend this even f urther, f ailover clusters can be
conf igured to use one of f our dif f erent cluster quorum models and
essentially the quorum type chosen f or a cluster def ines the cluster.
For example, a cluster that utilizes the Node and Disk Majority Quorum
can be called a Node and Disk Majority cluster.
A quorum is simply a conf iguration database f or Microsof t Cluster Service, and is stored in the quorum log f ile.
A standard quorum uses a quorum log f ile that is located on a disk hosted on a shared storage interconnect
that is accessible by all members of the cluster
Why quorum is necessary
When network problems occur, they can interf ere with communication between cluster nodes. A small set of
nodes might be able to communicate together across a f unctioning part of a network, but might not be able to
communicate with a dif f erent set of nodes in another part of the network. This can cause serious issues. In
this split situation, at least one of the sets of nodes must stop running as a cluster.
To prevent the issues that are caused by a split in the cluster, the cluster sof tware requires that any set of
nodes running as a cluster must use a voting algorithm to determine whether, at a given time, that set has
quorum. Because a given cluster has a specif ic set of nodes and a specif ic quorum conf iguration, the cluster
will know how many votes constitutes a majority (that is, a quorum). If the number drops below the majority,
the cluster stops running. Nodes will still listen f or the presence of other nodes, in case another node appears
again on the network, but the nodes will not begin to f unction as a cluster until the quorum exists again.
For example, in a f ive node cluster that is using a node majority, consider what happens if nodes 1, 2, and 3
can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they
continue running as a cluster. Nodes 4 and 5 are a minority and stop running as a cluster, which prevents the
problems of a split situation. If node 3 loses communication with other nodes, all nodes stop running as a
cluster. However, all f unctioning nodes will continue to listen f or communication, so that when the network
begins working again, the cluster can f orm and begin to run.
There are four quorum modes:
Node Majority: Each node that is available and in communication can vote. The cluster f unctions only
with a majority of the votes, that is, more than half .
Node and Disk Majority: Each node plus a designated disk in the cluster storage (the disk witness)
can vote, whenever they are available and in communication. The cluster f unctions only with a majority of
the votes, that is, more than half .
Node and File Share Majority: Each node plus a designated f ile share created by the administrator (the
f ile share witness) can vote, whenever they are available and in communication. The cluster f unctions
only with a majority of the votes, that is, more than half .
No Majority: Disk Only. The cluster has quorum if one node is available and in communication with a
specif ic disk in the cluster storage. Only the nodes that are also in communication with that disk can join
the cluster. This is equivalent to the quorum disk in Windows Server 2003. The disk is a single point of
f ailure, so only select scenarios should implement this quorum mode.
16. Witness Disk The witness disk is a disk in the cluster storage that is designated to hold a copy of
the cluster conf iguration database. (A witness disk is part of some, not all, quorum conf igurations.)
Configuration of two node Failover Cluster and Quorum Configuration:
Multi-site cluster is a disaster recovery solution and a high availability solution all rolled into one. A multi-site
cluster gives you the highest recovery point objective (RTO) and recovery time objective (RTO) available f or
your critical applications. With the introduction of Windows Server 2008 f ailover clustering a multi-site cluster
has become much more f easible with the introduction of cross subnet f ailover and support f or high latency
network communications.
Which edit ions include f ailover clust ering?
The f ailover cluster f eature is available in Windows Server 2008 R2 Enterprise and Windows Server 2008 R2
Datacenter. The f eature is not available in Windows Web Server 2008 R2 or Windows Server 2008 R2 Standard
Network Considerations
All Microsof t f ailover clusters must have redundant network communication paths. This ensures that a f ailure
of any one communication path will not result in a f alse f ailover and ensures that your cluster remains highly
available. A multi-site cluster has this requirement as well, so you will want to plan your network with that in
mind. There are generally two things that will have to travel between nodes: replication traf f ic and cluster
heartbeats. In addition to that, you will also need to consider client connectivity and cluster management
activity
Quorum model:
For a 2-node multi-site cluster conf iguration, the Microsof t recommended conf iguration is a Node and File
Share Majority quorum
Step 1 Configure the Cluster
Add the Failover Clustering f eature to both nodes of your cluster. Follow the below steps:
1. Click Start, click Administrative Tools, and then click Server Manager. (If the User Account Control dialog box
appears, conf irm that the action it displays is what you want, and then click Continue.)
2. In Server Manager, under Features Summary, click Add Features. Select Failover Clustering, and then click
Install
3. Follow the instructions in the wizard to complete the
installation of the f eature. When the wizard f inishes,
close it.
4. Repeat the process f or each server that you want to
include in the cluster.
5. Next you will want to have a look at your network
connections. It is best if you rename the connections on
each of your servers to ref lect the network that they
represent. This will make things easier to remember
later.
Go to properties of Cluster (or private) network and
check out register the connections addresses in DNS.
6. Next, go to Advanced Settings of your Network
Connections (hit Alt to see Advanced Settings menu) of
each server and make sure the Public network (LAN) is
f irst in the list:
7. Your private network should only contain an IP address
and Subnet mask. No Def ault Gateway or DNS servers
should be def ined. Your nodes need to be able to
communicate across this network, so make sure the
servers can communicate across this network; add static
routes if necessary.
Step 2 Validate the Cluster Configuration:
1. Open up the Failover Cluster Manager and click on
Validate a Conf iguration.
2. The Validation Wizard launches and presents you the
f irst screen as shown below. Add the two servers in your
cluster and click Next to continue.
3. we need this cluster to be supported so we must run all
the needed tests
4. Select run all tests.
5. Click next till it gives the report like below
When you click on view report, it will display the report
similar as below:
Step 2 Create a Cluster:
In the Failover Cluster Manager, click on Create a Cluster.
The next step is that you must create a name f or this
cluster and IP f or administering this cluster. This will be the
name that you will use to administer the cluster, not the
name of the SQL cluster resource which you will create
later. Enter a unique name and IP address and click Next.
Note: This is also the computer name that will need
permission to the File Share Witness as described later
in this document.
clip_image012
clip_image014
Conf irm your choices and click Next.
Click Next till f inish, it will create the cluster by name
MYCLUSTER.
Step 3 Implementing a Node and File Share
Majority quorum
First, we need to identif y the server that will hold our File
Share witness. This File Share witness should be located
in a 3
rd
location, accessible by both nodes of the cluster.
Once you have identif ied the server, share a f older as
you normally would share a f older. In my case, I create a
share called MYCLUSTER on a server named NYDC01
.
The key thing to remember about this share is that you
must give the cluster computer name read/write
permissions to the share at both the Share level and
NTFS level permissions. You will need to make sure
you give the cluster computer account read/write
permissions in both shared and NTFS f or MYCLUSTER
share.
Now with the shared f older in place and the appropriate
permissions assigned, you are ready to change your
quorum type. From Failover Cluster Manager, right-click
on your cluster, choose More Actions and Conf igure
Cluster Quorum Settings.
On the next screen choose Node and File Share
Majority and click Next.
In this screen, enter the path to the f ile share you
previously created and click Next.
Conf irm that the inf ormation is correct and click Next till
summary page and click Finish.
Now when you view your cluster, the Quorum
Conf iguration should say Node and File Share
Majority as shown below.
The steps I have outlined up until this point apply to any
multi-site cluster, whether it is a SQL, Exchange, File
Server or other type of f ailover cluster. The next step in
creating a multi-site cluster involves integrating your
storage and replication solution into the f ailover cluster

You might also like