You are on page 1of 31

Clustering Technologies

Updated: March 28, 2003

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows
Server 2003 with SP2

Clustering Technologies
The clustering technologies in products in the Microsoft Windows Server 2003 operating system are
designed to help you achieve high availability and scalability for applications that are critically important to
your business. These applications include corporate databases, e-mail, and Web-based services such as
retail Web sites. By using appropriate clustering technologies and carefully implementing good design and
operational practices (for example, configuration management and capacity management), you can scale
your installation appropriately and ensure that your applications and services are available whenever
customers and employees need them.

High availability is the ability to provide user access to a service or application for a high percentage of
scheduled time by attempting to reduce unscheduled outages and mitigate the impact of scheduled
downtime for particular servers. Scalability is the ability to easily increase or decrease computing capacity.
A cluster consists of two or more computers working together to provide a higher level of availability,
scalability, or both than can be obtained by using a single computer. Availability is increased in a cluster
because a failure in one computer results in the workload being redistributed to another computer.
Scalability tends to be increased, because in many situations it is easy to change the number of
computers in the cluster.

Windows Server 2003 provides two clustering technologies: server clusters and Network Load Balancing
(NLB). Server clusters primarily provide high availability; Network Load Balancing provides scalability and
at the same time helps increase availability of Web-based services.

Your choice of cluster technologies (server clusters or Network Load Balancing) depends primarily on
whether the applications you run have long-running in-memory state:

• Server clusters are designed for applications that have long-running in-memory state or

frequently updated data. These are called stateful applications. Examples of stateful applications
include database applications such as Microsoft SQL Server 2000 and messaging applications
such as Microsoft Exchange Server 2003.

Server clusters can combine up to eight servers.

• Network Load Balancing is intended for applications that do not have long-running in-memory

state. These are called stateless applications. A stateless application treats each client request as
an independent operation, and therefore it can load-balance each request independently.
Stateless applications often have read-only data or data that changes infrequently. Web front-
end servers, virtual private networks (VPNs), and File Transfer Protocol (FTP) servers typically
use Network Load Balancing. Network Load Balancing clusters can also support other TCP- or
UDP-based services and applications.

Network Load Balancing can combine up to 32 servers.

In addition, with Microsoft Application Center 2000 Service Pack 2, you can create another type of cluster,
a Component Load Balancing cluster. Component Load Balancing clusters balance the load between Web-
based applications distributed across multiple servers and simplify the management of those applications.
Application Center 2000 Service Pack 2 can be used with Web applications built on either the Microsoft
Windows 2000 or Windows Server 2003 operating systems.
Multitiered Approach for Deployment of Multiple Clustering Technologies
Microsoft does not support the configuration of server clusters and Network Load Balancing clusters on the
same server. Instead, use these technologies in a multitiered approach.

Clustering Technologies Architecture


A cluster consists of two or more computers (servers) working together. For server clusters, the individual
servers are called nodes. For Network Load Balancing clusters, the individual servers are called hosts.

Basic Architecture for Server Clusters


The following diagram shows a four-node server cluster of the most common type, called a single quorum
device cluster. In this type of server cluster, there are multiple nodes with one or more cluster disk arrays
(often called the cluster storage) and a connection device (bus). Each of the disks in the disk array are
owned and managed by only one node at a time. The quorum resource on the cluster disk array provides
node-independent storage for cluster configuration and state data, so that each node can obtain that data
even if one or more other nodes are down.

Four-Node Server Cluster Using a Single Quorum Device

Basic Architecture for Network Load Balancing Clusters


The following diagram shows a Network Load Balancing cluster with eight hosts. Incoming client requests
are distributed across the hosts. Each host runs a separate copy of the desired server application, for
example, Internet Information Services. If a host failed, incoming client requests would be directed to
other hosts in the cluster. If the load increased and additional hosts were needed, you could add them
dynamically to the cluster.

Network Load Balancing Cluster with Eight Hosts


Clustering Technologies Scenarios
This section describes the most common scenarios for using server clusters and Network Load Balancing.

Scenarios for Server Clusters


This section provides brief descriptions of some of the scenarios for server cluster deployment. The
scenarios cover three different aspects of server cluster deployment:

• The applications or services on the server cluster.

• The type of storage option: SCSI, Fibre Channel arbitrated loops, or Fibre Channel switched

fabric.

• The number of nodes and the ways that the nodes can fail over to each other.

Applications or Services on a Server Cluster


Server clusters are usually used for services, applications, or other resources that need high availability.
Some of the most common resources deployed on a server cluster include:

• Printing

• File sharing

• Network infrastructure services. These include the DHCP service and the WINS service.

• Services that support transaction processing and distributed applications. These services

include the Distributed Transaction Coordinator (DTC) and Message Queuing.


• Messaging applications. An example of a messaging application is Microsoft Exchange Server

2003.

• Database applications. An example of a database application is Microsoft SQL Server 2000.

Types of Storage Options


A variety of storage solutions are currently available for use with server clusters. As with all hardware that
you use in a cluster, be sure to choose solutions that are listed as compatible with Windows Server 2003,
Enterprise Edition, or Windows Server 2003, Datacenter Edition. Also be sure to follow the vendor’s
instructions closely.

The following table provides an overview of the three types of storage options available for server clusters
running Windows Server 2003, Enterprise Edition, or Windows Server 2003, Datacenter Edition:

Storage Options for Server Clusters

Storage Option Maximum Number of Supported Nodes

SCSI Two

Fibre Channel arbitrated loop Two

Fibre Channel switched fabric Eight


Number of Nodes and Failover Plan
Another aspect of server cluster design is the number of nodes used and the plan for application failover:

• N-node Failover Pairs. In this mode of operation, each application is set to fail over between

two specified nodes.

• Hot-Standby Server /N+I. Hot-standby server operation mode reduces the overhead of

failover pairs by consolidating the “spare” (idle) node for each pair into a single node, providing a
server that is capable of running the applications from each node pair in the event of a failure.
This mode of operation is also referred to as active/passive.

For larger clusters, N+I mode provides an extension of the hot-standby server mode where N
cluster nodes host applications and I cluster nodes are spare nodes.

• Failover Ring. In this mode of operation, each node in the cluster runs an application instance.

In the event of a failure, the application on the failed node is moved to the next node in
sequence.

• Random. For large clusters running multiple applications, the best policy in some cases is to

allow the server cluster to choose the fail over node at random.

Scenarios for Network Load Balancing


This section provides brief descriptions of some of the scenarios for deployment of Network Load
Balancing. The scenarios cover three different aspects of Network Load Balancing deployment:

• The types of servers or services in Network Load Balancing clusters.

• The number and mode of network adapters on each host.


Types of Servers or Services in Network Load Balancing Clusters
In Network Load Balancing clusters, some of the most common types of servers or services are as follows:

• Web and File Transfer Protocol (FTP) servers.

• ISA servers (for proxy servers and firewall services).

• Virtual private network (VPN) servers.

• Windows Media servers.

• Terminal servers.

Number and Mode of Network Adapters on Each Network Load Balancing Host
Another aspect of the design of a Network Load Balancing cluster is the number and mode of the network
adapter or adapters on each of the hosts:

Number and Mode of


Network Adapters
on Each Host Use

Single network adapter A cluster in which ordinary network communication among cluster hosts is not
in unicast mode required and in which there is limited dedicated traffic from outside the cluster
subnet to specific cluster hosts.

Multiple network A cluster in which ordinary network communication among cluster hosts is
adapters in unicast necessary or desirable. It is also appropriate when you want to separate the
mode traffic used to manage the cluster from the traffic occurring between the cluster
and client computers.

Single network adapter A cluster in which ordinary network communication among cluster hosts is
in multicast mode necessary or desirable but in which there is limited dedicated traffic from
outside the cluster subnet to specific cluster hosts.

Multiple network A cluster in which ordinary network communication among cluster hosts is
adapters in multicast necessary and in which there is heavy dedicated traffic from outside the cluster
mode subnet to specific cluster hosts.

Single Copy Clusters


Applies to: Exchange Server 2007 SP3, Exchange Server 2007 SP2, Exchange Server 2007 SP1,
Exchange Server 2007

Topic Last Modified: 2007-11-21

A single copy cluster (SCC) is a clustered mailbox server that uses shared storage in a failover cluster
configuration to allow multiple servers to manage a single copy of the storage groups. This feature is
similar to the clustering features in previous versions of Microsoft Exchange. However, there are some
significant changes and improvements that have been made. The way in which you build, manage, and
troubleshoot an SCC is completely different from the way in which previous versions of Exchange clusters
were built and managed. In addition, the out-of-box failover behavior has changed significantly in
Microsoft Exchange Server 2007.

These changes and improvements include:


• Improved setup experience Setup of a clustered mailbox server in a single copy cluster is

very different from the setup process used in previous versions of Exchange. In previous
versions, when Exchange Setup completed, there were additional tasks that needed to be
performed using Cluster Administrator before a clustered mailbox server (referred to in previous
versions as an Exchange Virtual Server) was created. In Exchange 2007, clustered mailbox
server installation is integrated into Exchange Setup. As a result, the clustered and non-clustered
Setup experience is similar, reducing the learning curve traditionally associated with clustered
applications. In addition, when Setup has completed, a clustered mailbox server will have been
created.

• Improved management experience In previous versions, Cluster Administrator was required

for several management tasks, such as stopping and starting a clustered mailbox server and
moving a clustered mailbox server between nodes in the cluster. In Exchange 2007, these tasks
and new clustered mailbox server management tasks have been integrated into the Exchange
management tools. For example, you can use the Exchange Management Shell to stop, start, and
move clustered mailbox servers. In addition, in Microsoft Exchange 2007 SP1, you can also use
the Exchange Management Console to stop, start, and move clustered mailbox servers.

• Optimized default settings In previous versions, after the clustered mailbox server was

created, additional administrative tasks had to be manually performed to configure the clustered
mailbox server for optimal behavior. In Exchange 2007, each clustered mailbox server is
configured with the optimal settings during Setup, eliminating the need for an administrator to
perform these tasks manually.

As illustrated in the following figure, SCCs require the use of a shared-nothing architecture, which includes
shared disk storage. In a shared-nothing architecture, although all nodes in the cluster can access shared
data, they cannot access it at the same time. For example, if a physical disk resource is assigned to
node 1 of a two-node cluster, node 2 cannot access the disk resource until node 1 is taken offline, fails, or
the disk resource is moved to node 2 manually.

Basic architecture of an SCC

In an SCC, an Exchange 2007 Mailbox server uses its own network identity, not the identity of any node in
the cluster. This network identity is referred to as a clustered mailbox server. If the node running a
clustered mailbox server experiences problems, the clustered mailbox server goes offline for a brief period
until another node takes control of the clustered mailbox server and brings it online. This process is known
as failover. The storage hosting the clustered mailbox server's storage groups and databases is hosted on
shared storage that is available to each possible host node of the clustered mailbox server. As the failover
occurs, the storage associated with the clustered mailbox server is logically detected from the failed node
and placed under the control of the new host node of the clustered mailbox server.
In addition to failover, an administrator can manually move a clustered mailbox server between nodes in a
cluster. This process is known as a handoff. A handoff should only be performed using the Move-
ClusteredMailboxServer cmdlet in the Exchange Management Shell, or, if running Exchange 2007 SP1,
by using the Manage Clustered Mailbox Server wizard in the Exchange Management Console.

Note:

Cluster Administrator and Cluster.exe provide mechanisms for moving resource groups between nodes
in a cluster. When performing a handoff of a clustered mailbox server in an SCC, we recommend that
you use the Move-ClusteredMailboxServer cmdlet or the Manage Clustered Mailbox Server wizard
instead of using the cluster management tools because both the cmdlet and the wizard enable the
administrator to specify a reason for the handoff.
Windows Failover Cluster
To create an SCC for Exchange 2007, you must create a failover cluster using the Cluster service. Failover
clustering is included in Windows Server 2003 Enterprise and Datacenter Edition and
Windows Server 2008 Enterprise and Datacenter operating systems. In both operating systems, the
Cluster service is the essential software component that controls all aspects of clustered resources. When
Exchange 2007 Setup is run on a node of a failover cluster, the cluster-aware version of Exchange is
automatically installed. A clustered mailbox server in an SCC environment contains the following
elements:

• Shared storage Exchange 2007 supports both shared storage and non-shared storage

clusters. SCCs use shared storage. For more information about non-shared storage Exchange
clusters, see Cluster Continuous Replication.

• Resource DLL Windows communicates with resources in a cluster by using a resource

dynamic-link library (DLL). Exchange 2007 provides its own custom resource DLL (named
Exres.dll) to communicate with the Cluster service. Communication between the Cluster service
and Exchange 2007 is customized to provide cluster functionality, including failover.

• Groups Exchange 2007 uses Windows cluster groups to contain and represent a clustered

mailbox server. In an SCC, a clustered mailbox server is a cluster group containing clustered
Exchange resources, such as an IP address, one or more physical disk resources, the
Microsoft Exchange System Attendant service, and other Exchange resources.

• Resources Clustered mailbox servers include resources, such as IP Address resources, Network

Name resources, and physical disk resources. Clustered mailbox servers also include their own
Exchange-specific resources. When creating a clustered mailbox server, Exchange automatically
creates the other essential Exchange-related resources, such as the Microsoft Exchange System
Attendant service, the Microsoft Exchange Information Store service, and one or more
Microsoft Exchange storage group or database instances.

Deploying Single Copy Clusters


Deploying an SCC is similar to deploying a stand-alone Exchange server, and it is similar to deploying
cluster continuous replication (CCR). However, there are some significant differences to be aware of when
deploying SCCs. We recommend that you review the following topics:

• Planning for Single Copy Clusters

• Installing a Single Copy Cluster on Windows Server 2003

• Installing a Single Copy Cluster on Windows Server 2008

• Managing Single Copy Clusters

• Uninstalling Clustered Mailbox Servers


Single Copy Cluster Resource Model
Applies to: Exchange Server 2007 SP3, Exchange Server 2007 SP2, Exchange Server 2007 SP1,
Exchange Server 2007

Topic Last Modified: 2007-10-26

Microsoft Exchange Server 2007 clusters create logical servers that are referred to as clustered mailbox
servers by using the Microsoft Windows Server Cluster service. Exchange 2007 only supports
active/passive single copy cluster (SCC) configurations. In previous versions of Exchange, clustered
mailbox servers were referred to as Exchange Virtual Servers. Unlike a stand-alone (non-clustered)
Exchange 2007 Mailbox server, a clustered mailbox server is mobile and can be failed over if the server
currently running the clustered mailbox server fails. When the computer that is actively servicing client
requests for Exchange in the cluster fails, one of the remaining nodes in the cluster takes over for the
failed clustered mailbox server, and clients can access this server by using the same Exchange Server
name.

To create an Exchange 2007 cluster, you first create a Windows Server 2003 cluster and then install
Exchange on each node of the cluster. It is also necessary to use Setup to create each clustered mailbox
server in the cluster.

A clustered mailbox server is a resource group that requires, at a minimum, the following resources:

• IP address

• Network name

• One or more physical disks for shared storage

• Several Exchange-specific resources

Client computers connect to a clustered mailbox server the same way that they connect to a stand-alone
computer running Exchange 2007. Exchange 2007 Setup automatically creates the IP address, network
name, and Exchange-specific resources. It is your responsibility to populate the physical disks in the
resource group.

Active/Passive Clusters
Exchange 2007 supports only clusters that have at least one passive node, for example, two active nodes
and a passive node. In active/passive clusters, the cluster includes at least one primary or active node and
at least one secondary or passive node. The secondary nodes are idle until a failover occurs on a primary
node. When the primary node in an active/passive cluster fails or is taken offline, the clustering feature in
Windows takes over. The failed node is taken offline, and a secondary node takes over the operations of
the failed node. It usually takes only a few minutes for the cluster to fail over to another node. As a result,
the Exchange resources on your cluster are unavailable to users for only a brief period of time.

In active/passive clusters, the number of clustered mailbox servers in the cluster is always less than the
number of physical nodes in the cluster.

Quorum Disk Resource


In addition to the resource groups created for each clustered mailbox server, an SCC always has a
resource group to represent the quorum of the cluster. This resource group, by default named Cluster
Group, is created when the cluster is created. For an SCC, the default quorum strategy uses a shared disk.
However, a Majority Node Set (MNS) quorum with file share witness is also supported for SCCs. In a
shared disk quorum, the disk containing the quorum resource is called the quorum disk, and it must be a
member of the default Cluster Group.

Note:

Failure of the quorum disk takes the cluster offline. This stops all clustered mailbox servers that the
cluster supports.
The quorum disk maintains configuration data in the quorum log, cluster database checkpoint, and
resource checkpoints. The quorum disk resource also provides persistent physical storage across system
failures. Because the cluster configuration is kept on a quorum disk resource, all nodes in the cluster must
be able to communicate with the node that owns it.
When a cluster is created or when network communication between nodes in a cluster fails, the quorum
disk resource prevents the nodes from forming multiple clusters. To form a cluster, a node must arbitrate
for, and gain ownership of, the quorum disk resource. For example, if a node cannot detect a cluster
during the discovery process, the node attempts to form its own cluster by taking control of the quorum
disk resource. However, if the node does not succeed in taking control of the quorum disk resource, it
cannot form a cluster.

The quorum disk resource stores the most current version of the cluster configuration data. This data
contains cluster configuration and state data for each individual node. When a node joins or forms a
cluster, the Cluster service updates the node's individual copy of the configuration database. When a node
joins an existing cluster, the Cluster service retrieves the configuration data from the other active nodes.

The Cluster service uses the quorum disk resource recovery logs to do the following:

• Guarantee that only one set of active, communicating nodes can operate as a cluster.

• Enable a node to form a cluster only if it can gain control of the quorum disk resource.

• Allow a node to join or remain in an existing cluster only if it can communicate with the node that

controls the quorum resource.

Note:

You should never create a clustered mailbox server in the default Cluster Group. Clustered mailbox
servers are supported only in dedicated resource groups.
Single Copy Cluster Resource Model
SCCs are based on creating a logical cluster model, called the resource model, which abstracts the
individual resources that make up a clustered mailbox server, such as the Microsoft Exchange Information
Store service and the databases maintained by the clustered mailbox server. The resource model is
represented as a tree to allow the software to define an order of actions when a piece of the model starts
or stops.

The following are significant differences between the Exchange 2007 resource model and the
Exchange Server 2003 resource model:

• There is a new cluster resource called the Microsoft Exchange Database Instance.

• Physical disk resources are now dependencies for database instances.

• The Microsoft Exchange Information Store service no longer depends on the Microsoft Exchange

System Attendant service. Thus, the Microsoft Exchange Information Store cluster resource does
not depend on the Microsoft Exchange System Attendant resource.

• Only the IP Address and Network Name resources should have the Affect the Group option

selected.

Databases, the Microsoft Exchange Information Store service, and other resources can be brought online
and offline manually by an administrator or automatically by the Cluster service. The term online generally
refers to the process of starting a service and mounting a database. The term offline generally refers to
the process of stopping a service, dismounting a database, or stopping an entire clustered mailbox server.
The Cluster service tracks the online and offline states of resources in the resource model.

The resources are used as a focal point for managing the processes, databases, and network identities
associated with the clustered mailbox server. The following is a brief summary of each resource type in an
SCC:

• Microsoft Exchange Database Instance This resource represents a database that is hosted

on the clustered mailbox server. When this resource is online, the database is mounted. When
this resource is offline, the database is dismounted. By default, the Microsoft Exchange
Information Store is a dependency of each database instance. In addition, you must manually
make the appropriate disk dependencies of each database instance. All dependencies must be
online before the database instance comes online. By default, the Affect the Group option is
disabled for this resource.

• Microsoft Exchange Information Store This resource represents the clustered mailbox

server. When this resource is online, the Microsoft Exchange Information Store service is started
and capable of accepting MAPI traffic. The clustered mailbox server may or may not have
mounted databases. When it is offline, the Microsoft Exchange Information Store service is
stopped, and it is not capable of accepting MAPI traffic. By default, the Affect the Group option
is disabled for this resource.

• Microsoft Exchange System Attendant This resource represents the Microsoft Exchange

System Attendant service for the clustered mailbox server. When this resource is online, the
Microsoft Exchange System Attendant service is started. When this resource is offline, the
Microsoft Exchange System Attendant service is stopped. By default, the Affect the Group
option is disabled for this resource.

• Network Name (Name) This resource represents the network name of the clustered mailbox

server. When the Network Name resource is online, the name is associated with a network
adapter on the specified computer. When the Network Name resource is offline, the name is not
associated with a network adapter on the specified computer. By default, Kerberos authentication
and successful Domain Name System (DNS) registration are required for this resource. In
addition, by default, the Affect the Group option is enabled for this resource.

• IP Address (Name) This resource represents the IP address associated with the clustered

mailbox server. This IP address is bound to the clustered mailbox server network name in DNS.
When the IP Address resource is online, it is associated with a network adapter on the specified
computer. When the IP Address resource is offline, it is not associated with a network adapter on
the specified computer. By default, NetBIOS is enabled for this resource. In addition, by default,
the Affect the Group option is enabled for this resource.

In addition to these resources, SCCs also contain physical disk resources that represent the disks
containing the storage group and database files for the clustered mailbox servers in the SCC. When you
add physical disks to the cluster group containing the clustered mailbox server, in addition to making the
disks dependencies for the appropriate database, you must also clear the Affect the Group check box for
each physical disk resource. If this check box is left selected, a data failure will cause failover, and this is
contrary to the default behavior.

Planning for Single Copy Clusters


Applies to: Exchange Server 2007 SP3, Exchange Server 2007 SP2, Exchange Server 2007 SP1,
Exchange Server 2007

Topic Last Modified: 2008-07-24

Although deploying a Microsoft Exchange Server 2007 single copy cluster (SCC) is similar to deploying a
stand-alone Exchange 2007 server, and similar to deploying cluster continuous replication (CCR), there
are important differences you must consider.

General Requirements for Single Copy Clusters


The general requirements for deploying an SCC are as follows:
• Make sure that you are running Domain Name System (DNS). Ideally, the DNS server should

accept dynamic updates. If the DNS server does not accept dynamic updates, you must create a
DNS Host (A) record for each clustered mailbox server and one for the cluster itself. Otherwise,
Exchange does not function properly. For more information about how to configure DNS for
Exchange, see Microsoft Knowledge Base article 322856, How to configure DNS to use with
Exchange Server.

• If your cluster nodes belong to a directory naming service zone that has a different name than

the Active Directory directory service domain name that the computer joined, the
DNSHostName property does not include the subdomain name by default. In this situation, you
may need to change the DNSHostName property to ensure that some services, such as the File
Replication Service (FRS), work correctly. For more information, see Knowledge Base article
240942, Active Directory DNSHostName property does not include subdomain.

• All cluster nodes must be member servers in the same Active Directory domain and site.

Exchange 2007 is not supported on nodes that are also Active Directory directory servers, or
nodes that are members of different Active Directory domains or sites.

• Make sure the cluster is formed before installing Exchange 2007. Make sure that at least one

physical disk resource is present in the cluster group in which you intend to install
Exchange 2007 before installing Exchange 2007. After installing the clustered mailbox server,
configure the appropriate disk resource dependencies in Cluster Administrator.

• Make sure that the clustered mailbox server names are 15 characters or less.

• The cluster in which Exchange 2007 is installed cannot contain Exchange Server 2003,

Exchange 2000 Server, or any cluster-aware version of Microsoft SQL Server. Running
Exchange 2007 in a cluster with any of these other applications is not supported. Running
Exchange 2007 in a cluster with SQL Server Express Edition or another database application
(such as Microsoft Office Access) is permitted, provided that the database application is not
clustered.

• Before you install Exchange 2007, make sure that the folder to which you will install all of the

Exchange data on the physical disk resource is empty.

• You must install the same version of Exchange 2007 on all nodes in the cluster that are

configured as hosts of a clustered mailbox server. In addition, the operating system and the
Exchange files must be installed on the same paths and drives for all nodes in the cluster. This
requires that all computers have a similar, although not identical, disk configuration.

• Do not install, create, or move any resources from the default cluster group to the resource

group containing the clustered mailbox server. In addition, do not install, create, or move any
resources from the group containing the clustered mailbox server to the default cluster group.
The default cluster group should contain only the cluster IP Address, Network Name and quorum
resources. Moving or combining resources to or with the default cluster group is not supported.

Important:

Clusters running previous versions of Exchange require a clustered instance of the Microsoft Distributed
Transaction Coordinator (MSDTC). Exchange 2007 removes the requirement for the clustered MSDTC
resource. Clustered mailbox servers in an SCC do not use the MSDTC resource installed in the failover
cluster. Third-party applications might require an MSDTC resource because of COM+ dependencies. In
Windows Server 2003, the MSDTC cluster resource requires the use of shared storage in the cluster. If
a clustered MSDTC resource is required by a third-party application, it should be installed in a cluster
group that is separate from the group containing the clustered mailbox server. Windows Server 2008
provides a local, non-clustered MSDTC instance that removes the requirement for shared storage in a
Windows Server 2008 failover cluster. For more information about MSDTC changes in
Windows Server 2008, see Windows Server 2008 Help.
Hardware Requirements for Single Copy Clusters
The hardware requirements for SCC deployments are as follows:

• The entire solution must be listed in the Cluster Solution category in the Microsoft Windows

Server Catalog of Tested Products.

• If the SCC is geographically dispersed, it must be listed in the Geographically Dispersed Cluster

Solution category in the Microsoft Windows Server Catalog of Tested Products.

Software Requirements for Single Copy Clusters


The software requirements for SCC deployments are as follows:

• All nodes in the cluster must have the Windows Server 2008 Enterprise or Windows Server 2003

Enterprise Edition operating system installed on each node of the cluster using the same boot and
system drive letters, and the same Windows path. You cannot have a cluster with one or more
nodes running Windows Server 2003 and other nodes running Windows Server 2008. Mixing
operating system versions in a failover cluster is not supported.

• Only the Mailbox server role can be installed in a cluster. No other Exchange server roles can be

installed on a computer that is part of a failover cluster.

Network Requirements for Single Copy Clusters


It is important that the networks used for client and cluster communications are configured correctly. This
section provides links to the procedures that are necessary to verify that your private and public network
settings are configured correctly. In addition, you must make sure that the network connection order is
configured correctly for the cluster.

Consider the following when designing the network infrastructure for your SCC deployment:

• Each node must have at least two network adapters available for the cluster. Clients and other

servers only have to be able to access the nodes from one of the network adapters. The other
network adapters are used for intra-cluster communication.

• You must have a sufficient number of static IP addresses available when you create clustered

mailbox servers. IP addresses are required for both the public and private networks.
Requirements related to private and public addresses are as follows:

• Private addresses Each node requires one static IP address for each network adapter

that will be used for the cluster private network. You must use static IP addresses that are not
on the same subnet or network as one of the public networks. We recommend that you use
10.10.10.x with a subnet mask of 255.255.255.0 as the private IP address subnet for the
private network. If your public network uses a 10.x.x.x network and a 255.255.255.0 subnet
mask, we recommend that you use alternate private network IP addresses and subnet mask.
If you configure more than one private network, unique addresses and subnets are required
for each private network adapter and network.

• Public addresses Each node requires one static IP address for each network adapter

that will be used for the cluster public network. Additionally, static IP addresses are required
for the server cluster and the clustered mailbox server so that they can be accessed by clients
and administrators. You must use static IP addresses that are not on the same subnet or
network as one of the private networks.

Note:

If you are installing SCC on Windows Server 2008, you can use a dynamically assigned Internet
Protocol version 6 (IPv6) address along with the static IPv4 addresses used for the private or public
networks.

• If you are installing SCC on Windows Server 2003, the Cluster service requires the private

network for all nodes in a cluster to be on the same subnet. To accomplish this in a
geographically dispersed environment, you can use virtual LAN (VLAN) switches on the
interconnects between two nodes. If you use a VLAN, the point-to-point, round-trip latency must
be less than 0.5 seconds. In addition, the link between two nodes must appear as a single point-
to-point connection from the perspective of the Windows operating system running on the nodes.
To avoid single points of failure, use independent VLAN hardware for the different paths between
the nodes. The same subnet restriction does not apply to failover clusters running on
Windows Server 2008.

• If you are installing SCC on Windows Server 2003, the Cluster service requires the public network

for all nodes in a cluster to be on the same subnet, and this subnet must be different from the
subnet used for the private network. The cluster public network should provide connectivity to
other Exchange servers and other services, such as Active Directory and DNS. You can prevent
this from being a single point of failure by using network adapter teaming or similar technology.
The same subnet restriction does not apply to failover clusters running on Windows Server 2008.

• A separate cluster private network must be provided. The private network is used for cluster

inter-node communication. This network can be localized to computers in the cluster and does
not require DNS services.

• If you are installing SCC on Windows Server 2003, the network connection order in Windows

must be configured with the public networks at the top of the connection order, and the network
priority in the cluster must be configured with the private networks at the top of the priority
order.

• Heartbeat requirements may not be the most stringent public network bandwidth and latency

requirement for a two-datacenter configuration. You must evaluate the total network load, which
includes client, Active Directory, transport, and other traffic to determine the necessary network
requirements.

Storage Requirements for Single Copy Clusters


SCCs use shared storage to store clustered mailbox server data (storage groups and databases). The
quorum resource can also be stored on shared storage. An alternative to using shared storage for the
quorum resource is to use a Majority Node Set (MNS) quorum. This can be a traditional MNS quorum, or
an MNS quorum with file share witness.

It is critical that the following tasks be performed in the order shown for the SCC to operate correctly:

1. All shared storage must be configured prior to cluster formation on each node that will be part of
the cluster. It is mandatory that the quorum disk be configured and available to all nodes in the
cluster prior to cluster formation. The cluster formation will fail if the quorum is not available.
2. After the cluster has been formed, and before Exchange is installed, the physical disk resources
for the clustered mailbox server shared storage must be configured.
3. After Exchange has been installed and the clustered mailbox server has been created, the
physical disk resource dependencies must be configured.

Note:

Shared storage for a clustered mailbox server must be accessible from all nodes that can host the
clustered mailbox server.
When you design your SCC storage solution, we recommend that you follow these best practices:

• Use the general storage planning guidance in Planning Storage Configurations.

• Store the database files and transaction log files on different logical unit numbers (LUNs).

• Use NTFS file system volume mount points to surface the volumes to the operating system.

• Use recognizable names that can be directly and obviously tied to the hosted storage group or

database. If different volumes are used for logs and databases, the paths should identify the type
of data. This approach can help prevent human errors as the number of databases and storage
groups increases.

Note:

Exchange 2007 does not support placing transaction logs or database files in the root of a volume.
Active Directory Requirements for Single Copy Clusters
SCC has all the same requirements of the Active Directory infrastructure that a stand-alone server has
plus additional requirements. In a multiple datacenter solution, both datacenters must have adequate
Active Directory infrastructure support because, at any time, either datacenter could be hosting the
clustered mailbox server. This capacity needs to be present even if the other datacenters are not
available. Additionally, all nodes in the cluster must be in the same domain and the Cluster service
account must have the appropriate permissions.

Note:

Geographically dispersed clusters also require that a single Active Directory site be stretched between
the datacenters. However, only the clustered node needs to be in the site in the second datacenter.
Third-party hardware and replication technology is required to deploy a geographically dispersed SCC
solution.
Service Account Requirements for Single Copy Clusters
If you are installing SCC on Windows Server 2003, you must use a domain account for the Cluster service
account. All nodes in the cluster must be members of the same domain, and all nodes in the cluster must
use the same Cluster service account. The Cluster service account must also be a member of the local
administrators group on each node that is capable of hosting a clustered mailbox server.

The Cluster service account is responsible for creating and maintaining the computer account identified by
and associated with the failover cluster's Network Name resource when that resource is brought online. To
ensure that the Cluster service account has the appropriate permissions, see Knowledge Base article
307532, How to troubleshoot the Cluster service account when it modifies computer objects. Additional
information can be found in Knowledge Base article 251335, Domain Users Cannot Join Workstation or
Server to a Domain.

If you are installing SCC on Windows Server 2008, the Cluster service will run under the LocalSystem
(SYSTEM) account.

For More Information


For more information about failover clusters in Windows Server 2008 and their Windows Server 2003
predecessor, server clusters, see the following resources:
• Failover Clustering with Windows Server 2008

• Clustering Technologies

• Windows Server 2008 and Windows Server 2003 Help

Cluster Continuous Replication


Applies to: Exchange Server 2007 SP3, Exchange Server 2007 SP2, Exchange Server 2007 SP1,
Exchange Server 2007

Topic Last Modified: 2008-03-21

Cluster continuous replication (CCR) is a high availability feature of Microsoft Exchange Server 2007 that
combines the asynchronous log shipping and replay technology built into Exchange 2007 with the failover
and management features provided by the Cluster service.

CCR is designed to provide high availability for Exchange 2007 Mailbox servers by providing a solution
that:

• Has no single point of failure.

• Has no special hardware requirements.

• Has no shared storage requirements.

• Can be deployed in one or two datacenter configurations.

• Can reduce full backup frequency, reduce total backed up data volume, and shorten the service

level agreement (SLA) for recovery time from first failure.

CCR uses the database failure recovery functionality in Exchange 2007 to enable the continuous and
asynchronous updating of a second copy of a database with the changes that have been made to the
active copy of the database. During installation of the passive node in a CCR environment, each storage
group and its database is copied from the active node to the passive node. This operation is called
seeding, and it provides a baseline of the database for replication. After the initial seeding is performed,
log copying and replay are performed continuously.

In a CCR environment, the replication capabilities are integrated with the Cluster service to deliver a high
availability solution. In addition to providing data and service availability, CCR also provides for scheduled
outages. When updates need to be installed or when maintenance needs to be performed, an
administrator can move a clustered mailbox server (called an Exchange Virtual Server in previous versions
of Exchange Server) manually to a passive node. After the move operation is complete, the administrator
can then perform the needed maintenance.

The move operation is performed using the Move-ClusteredMailboxServer cmdlet in the Exchange
Management Shell. Microsoft Exchange Server 2007 Service Pack 1 (SP1) also includes a new Manage
Clustered Mailbox Server wizard in the Exchange Management Console that you can use to move clustered
mailbox servers. The logic used by these tasks performs the necessary enforcement to make sure that no
mailbox data is lost during the move. The tasks also perform checks before the move to make sure that
replication on the passive node is ready to be quickly activated.

The key facts about CCR are as follows:

• Continuous replication is asynchronous Logs are not copied until they are closed and no

longer in use by the Mailbox server. This means that the passive node usually does not have a
copy of every log file that exists on the active node. The one exception is when the administrator
has initiated a scheduled outage of the active node to apply an update or perform some other
maintenance.
• Continuous replication places almost no CPU and input/output (I/O) load on the active

node during normal operation CCR uses the passive node to copy and replay the logs. Logs
are accessed by the passive node via a secured file share.

• Active and passive node changes over the lifetime of the cluster are designated

automatically For example, after a failover, the active and passive designation reverses. This
means the direction of replication reverses. No administrative action is required to reverse the
replication. The system manages the replication reversal automatically.

• Failover and scheduled outages are symmetric in function and performance For

example, it takes just as long to fail over from Node1 to Node2 as it does to fail over from Node2
to Node1. Typically, this would be under two minutes. On larger servers, scheduled outages
typically would be less than four minutes. The time difference between a failover and scheduled
outages is associated with the time it takes to do a controlled shutdown of the active node on a
scheduled outage. This performance difference may be reduced in a future service pack.

• Volume Shadow Copy Service (VSS) backups on the passive node are supported This

allows administrators to offload backups from the active node and extend the backup window. In
addition, larger configurations are not obligated by performance requirements to have hardware
VSS support to use VSS backups. The workload on the passive node is primarily log copying and
log replay, neither of which is constrained in real time like the clustered mailbox server on the
active node. For example, the active node has to respond to client requests in a timely way. A
longer backup window can be used, because the passive node has no real-time response
constraints, thereby allowing for larger databases and larger mailbox sizes.

• Total data on backup media is reduced The CCR passive copy provides the first line of

defense against corruption and data loss. Thus, a double failure is required before backups are
needed. Recovery from the first failure can have a relatively short SLA because no restore is
required. Recovery from the second failure can have a much longer SLA. As a result, backups can
be done on a weekly full cycle with a daily incremental backup strategy. This reduces the total
volume of data that must be placed on the backup media.

• CCR can be combined with standby continuous replication (SCR) CCR can be combined

with SCR to replicate storage groups locally in a primary data center (using CCR for high
availability) and remotely in a secondary or backup datacenter (using SCR for site resilience). The
secondary datacenter could contain a passive node in a failover cluster that hosts the SCR
targets. This type of cluster is called a standby cluster because it does not contain any clustered
mailbox servers, but it can be quickly provisioned with a replacement clustered mailbox server in
a recovery scenario. If the primary datacenter fails or is otherwise lost, the SCR targets hosted in
this standby cluster can be quickly activated on the standby cluster.

CCR Core Architecture


CCR combines the following elements:

• Failover and virtualization features provided by Microsoft failover clusters

• A majority-based failover cluster quorum model that uses a file share as a witness for cluster

activity

• Transaction log replication and replay features in Exchange 2007

• Message queue feature of the Hub Transport server called the transport dumpster
Windows Failover Cluster
As shown in the following figure, in Exchange 2007 SP1, CCR uses two computers (referred to as nodes)
joined in a single failover cluster running either Windows Server 2003 Service Pack 2 or
Windows Server 2008. The nodes in the failover cluster host a single clustered mailbox server. A node
that is currently running a clustered mailbox server is called the active node, and a node that is not
running a clustered mailbox server, but is part of the cluster, and the target for continuous replication, is
called the passive node. As a result of scheduled maintained and unscheduled outages, the designation of
a node as active or passive will change several times throughout the lifetime of the failover cluster.

Basic deployment of CCR

The failover cluster is built using the Cluster service and a specific type of cluster quorum model:

• For Windows Server 2008, the recommended quorum is called a Node and File Share Majority

quorum. For detailed steps about how to configure a failover cluster to use the Node and File
Share Majority quorum model, see How to Configure the Node and File Share Majority Quorum.

• For Windows Server 2003, the recommended quorum is called a Majority Node Set (MNS)

quorum with file share witness. This quorum model is available in Windows Server 2003 Service
Pack 2 (SP2). It is also available in a hotfix for Windows Server 2003 SP1 described in Microsoft
Knowledge Base article 921181, An update is available that adds a file share witness feature and
a configurable cluster heartbeats feature to Windows Server 2003 Service Pack 1-based server
clusters; however, Exchange 2007 SP1 requires Windows Server 2003 Service Pack 2.

File Share Witness


Both of the preceding quorum models use a file share on a third computer as a witness. In these quorum
models, a file share on a third computer is used to avoid an occurrence of network partition within the
cluster, also known as split brain syndrome. Split brain syndrome occurs when all networks designated to
carry internal cluster communications fail, and nodes cannot receive heartbeat signals from each other.
Split brain syndrome is prevented by always requiring a majority of the two nodes and the file share to be
available and interacting for the clustered mailbox server to be operational. When a majority of the
computers are communicating, the cluster is said to have a quorum. The file share for the file share
witness can be hosted on any computer running Windows Server. There is no requirement that the version
of the Windows Server operating system hosting the file share match the operation system of the CCR
environment. However, we recommend that you use a Hub Transport server in the Active Directory
directory service site containing the clustered mailbox server to host the file share, because this allows a
messaging administrator to maintain control over the file share.

Note:

The file share used by the file share witness cannot be hosted in a Distributed File System (DFS)
environment.
The file share witness uses a file share on a computer outside the cluster to act as a witness to the
activities of the two nodes that are the cluster. The witness is used by the two nodes to track which node
is in control of the cluster. The note board is only required when the two nodes cannot communicate with
each other. Consider a two-node clustered mailbox server made up of Node1 and Node2. In this case,
Node1 is the node that can take control of the note board and is therefore able to take control of the
cluster and bring up the clustered mailbox server. If Node2 is operational, but is unable to communicate
to Node1, Node2 will try to take control of the note board and fail. The inability of Node2 to control the
note board means that it should not bring up a clustered mailbox server.

When the two nodes are able to interact with each other, the note board is not necessary and could be
offline. However, a subsequent failure of either node will prevent the clustered mailbox server from being
online if the file share witness is not available.

The file share does not maintain any more state than previously described. Therefore, all cluster
configuration information is exchanged between the two nodes themselves. This is important to
understand, because it means that if Node1 is available and Node2 is unavailable, Node2 cannot become
available until it communicates to Node1, even if it can communicate to the file share witness.

The file share witness support provides a periodic access check of the file share witness. If the access
check fails, an event is generated. The event can be detected, collected, and reported by the monitoring
system. This allows the operations staff to correct the issue, prior to the issue causing an outage due to a
second concurrent failure.

The file share is accessed under the following conditions:

• When a cluster node comes up and only one cluster node is available.

• When a network connection problem prevents a previously reachable node from communicating

with the cluster.

• When a cluster node leaves the cluster.

• Periodically for validation purposes. The frequency is configurable.

For these reasons, the load on the file share is light. As a result, a single server can provide file shares for
multiple CCR environments. However, each CCR environment should have its own dedicated folder and
share on this server.

File Share Witness Considerations


CCR is a two-node environment that uses either an MNS quorum with file share witness, or a Node and
File Share Majority quorum instead of a third node (or more nodes) in the cluster that was required in
traditional MNS clusters. A geographically dispersed CCR environment is a two datacenter deployment in
which the active node is deployed in the primary datacenter, and the passive node is deployed in a
secondary datacenter. Thus, in a geographically dispersed CCR environment, there are two options for
placement of the file share: placing it in the primary datacenter, or placing it in a third datacenter.

The first option is to configure the file share on a Hub Transport server in the primary datacenter. A Hub
Transport server is recommended because it allows a messaging administrator to manage and monitor
outages of the file share. Our experience and customer feedback indicates that the most common types of
network service interruptions occur in wide area network (WAN) topologies. Placing the file share in the
primary datacenter is useful because it prevents service interruptions due to network failures between the
two datacenters. Use of this configuration means automatic failover will not occur in the event of an
outage of the primary datacenter. It does, however, ensure that majority in the failover cluster is not
affected by a network failure between the primary and secondary datacenter.
The second option is to configure the file share on a managed server role within a third physical site. A
managed server role is a server that is supported and maintained to a similar degree of other servers that
are critical for the delivery of the messaging service. An example of a managed server role is a Hub
Transport server in the primary datacenter. This third physical site could be a branch office or a third
datacenter. A requirement of this configuration is that the third site must have a network infrastructure to
the primary datacenter and secondary datacenter that has low latency and high reliability.

Transaction Log Replication and Replay


Transaction log replication and replay is used to copy the changed data and update the passive copy's
database. Replication takes advantage of the change history produced by the Extensible Storage Engine
(ESE). This change history is represented as a sequence of fixed-size 1 megabyte (MB) log files. The
replication functionality copies the log files to the passive node as each log file is generated. The
replication mechanism is asynchronous to the online database. When the logs arrive at the passive node,
they are checked for corruption and then replayed into the copy of the database that is stored on the
passive node. The replay process makes the changes described in the change log to the passive node's
database, which makes the passive node's database match the production database with a slight time lag.

Because the data is replicated between the nodes, the clustered mailbox server can operate on either
node in the cluster. This capability provides increased availability because scheduled outages and failures
of one node do not cause an extended outage of the clustered mailbox server. In addition, service outages
of the storage on one node will not impact availability of the other node and the clustered mailbox server.
Assuming that the file share is still available and that it can communicate with the passive node, an
outage of the active node causes the clustered mailbox server to move to a remaining node, and it
continues to operate.

In a CCR environment, the transaction log file folder on the active node is shared using a standard
Windows file share. The object globally unique identifier (GUID) for the storage group is used for the share
name, and a dollar sign ($) is added to the end of the share. The Microsoft Exchange Replication service
on the passive node connects to the share on the active node and copies, or pulls, the log files using the
Server Message Block (SMB) protocol. The passive node then verifies the log file and replays it into the
copy of the database on the passive node.

Note:

The SMB traffic for transaction log file replication is not encrypted. If needed, you can use Internet
Protocol security (IPsec) to encrypt this traffic. Only transaction log file replication occurs using the SMB
protocol. Reseeding a passive copy occurs using the ESE backup application programming interface
(API), which is an unencrypted communication. If needed, IPsec can be used to encrypt this data.
Continuous Replication over Redundant Cluster Networks
In the release to manufacturing (RTM) version of Microsoft Exchange Server 2007, all transaction log file
copying and seeding in a CCR environment occurs over the public network. This configuration has the
following limits:

• When the passive node is unavailable for several hours, a significant number of logs can build up

that need to be transferred. The movement of those logs should be as rapid as possible when the
passive node is available. By copying the logs over the public network, the movement of the logs
contends with client traffic. This affects client traffic and slows the resynchronization.

• When the public network fails, the failover is lossy, even though the log data is available.

• Using an isolated network for log communication allows you to provide security for messaging

data without using encryption and its associated performance penalty.

• Log storms may occur under some circumstances. When they occur, the system experiences an

unusually high replication burden. This could cause client starvation if the log data must be
communicated over the same network used to communicate with clients.

Not all of these issues will occur with the same frequency. However, the first issue is effectively
guaranteed to happen every few months because passive nodes are taken offline for regular maintenance
activity.
Exchange 2007 SP1 minimizes the effects of the preceding problems by allowing the administrator to
create one or more mixed networks in the cluster (a mixed network is a cluster network that supports
both internal cluster heartbeat traffic and client traffic) for log shipping. Exchange 2007 SP1 also enables
an administrator to specify a specific mixed network to be used for seeding.

Note:

Cluster networks used for log shipping and seeding must be configured as mixed networks. A mixed
network is any cluster network that is configured for both cluster (heartbeat) and client access traffic.
In addition, on the network adapter being configured with a continuous replication host name, the
administrator must clear the Register this connection’s addresses in DNS check box in the
Advanced TCP/IP properties dialog box and use either static DNS entries or Hosts file entries on each
node to allow name resolution for the newly created host names by each node. The DNS server used by
the network adapter can be located on the public or private network; however, regardless of its
location, it must be accessible by both nodes so that host name resolution can occur. In addition, on
Windows Server 2008, network adapters used for log shipping or seeding require NetBIOS to be
enabled.
Support for log file copying over a mixed network is configured using a new cmdlet called Enable-
ContinuousReplicationHostName. Similarly, turning off this feature is accomplished using the Disable-
ContinuousReplicationHostName cmdlet.

After a clustered mailbox server exists in a CCR environment, an administrator can run Enable-
ContinuousReplicationHostName on both nodes of the cluster and specify additional IP addresses and
host names, which will then be created in dedicated cluster groups that are associated with each node.
After this task has been performed, the Microsoft Exchange Replication service will begin using the newly
created network for log copying shortly after successful configuration and upon confirming that the new
network is operational. If multiple new networks are created, the Microsoft Exchange Replication service
will randomly select one of them. If the specified network becomes unavailable, the Microsoft Exchange
Replication service will automatically begin using other replication networks, or if none are available, it will
begin using the public network for log shipping within five minutes. (Microsoft Exchange Replication
service network discovery occurs every five minutes.) When the preferred replication network is again
available, the Microsoft Exchange Replication service will automatically revert back to using it for log
shipping.

For more information about these cmdlets, see Enable-ContinuousReplicationHostName and Disable-
ContinuousReplicationHostName.

Support for seeding over a redundant cluster network is configured using the Update-
StorageGroupCopy cmdlet, which has been updated in Exchange 2007 SP1 to include a new parameter
called DataHostNames. This parameter is used to specify which cluster network should be used for
seeding. For more information about the changes to the Update-StorageGroupCopy cmdlet in
Exchange 2007 SP1, see Update-StorageGroupCopy.

After a cluster network has been created for continuous replication, you can use the Get-
ClusteredMailboxServerStatus cmdlet to view updated information about cluster networks that have
been enabled for continuous replication activity. The new output details include:

• OperationalReplicationHostNames:{Host1,Host2,Host3}

• FailedReplicationHostNames:{Host4}

• InUseReplicationHostNames:{Host1,Host2}

For more information about the changes to the Get-ClusteredMailboxServerStatus cmdlet in


Exchange 2007 SP1, see Get-ClusteredMailboxServerStatus.

Transport Dumpster
The bulk of the lost data that occurs during an automatic recovery is subsequently automatically
recovered by a Hub Transport server feature called the transport dumpster. The transport dumpster for a
specific database may be located on all Hub Transport servers in the Active Directory site containing the
clustered mailbox server. As a message goes through Hub Transport servers on its way to a clustered
mailbox server in a CCR environment, a copy is kept in the transport queue (mail.que) until the replication
window has passed. The transport dumpster is a required component for CCR deployments. The transport
dumpster takes advantage of the redundancy in the environment to reclaim some of the data affected by
the failover. Specifically, Hub Transport servers maintain a queue of recently delivered mail. This queue is
bound by the amount of time mail is kept and the total space used. When a failover is experienced that is
not lossless, CCR on the clustered mailbox server automatically requests every Hub Transport server in
the Active Directory site to resubmit mail from the transport dumpster queue. The information store
automatically deletes the duplicates and again delivers mail that was lost.

The transport dumpster is enabled for CCR and, in Exchange 2007 SP1, also for local continuous
replication (LCR). The transport dumpster is not enabled for SCR or single copy clusters (SCCs). For CCR,
the necessary condition for an e-mail message to be retained in the transport dumpster is that it has at
least one recipient whose mailbox is on a clustered mailbox server in a CCR environment or in SP1, on a
mailbox database enabled for LCR.

The transport dumpster is designed to help protect against data loss by providing an administrator with
the option to configure CCR such that the clustered mailbox server will automatically come online on
another node, with a limited amount of data loss. When this happens, the system automatically delivers
all the recent e-mail messages sent to users on this server, by taking advantage of the transport
dumpster where these e-mail messages are still stored. This helps to prevent data loss in most situations.
In a CCR environment, request for redelivery from the transport dumpster on all Hub Transport servers in
the site is performed automatically. In Exchange 2007 RTM, the retry interval is hard-coded to seven
days. In Exchange 2007 SP1, the retry interval is equal to the value set for MaxDumpsterTime. In an LCR
environment, the request for redelivery from all Hub Transport servers in the site occurs as part of the
Restore-StorageGroupCopy task.

Situations in which data loss is not mitigated by the transport dumpster include:

• Drafts folder for any Microsoft Outlook clients in online mode.

• Appointments, contact updates, property updates, tasks, and task updates.

• Outgoing mail that is in transit from the client to the Hub Transport server. There is a period of

time during which the e-mail message only exists on the sender's Mailbox server.

Deploying Cluster Continuous Replication


Deploying CCR is similar to deploying a stand-alone Exchange server, and it is similar to deploying an
SCC. For more information about SCCs, see Single Copy Clusters. However, there are some significant
differences to be aware of when deploying CCR. We recommend that you review the following topics
before designing and deploying CCR in your environment:

• Planning for Cluster Continuous Replication

• Cluster Continuous Replication Resource Model

• Cluster Continuous Replication Recovery Behavior

• Scheduled and Unscheduled Outages

• Migrating Clusters That Are Running Previous Versions of Exchange

After you are ready to deploy CCR, you can begin the process by performing the steps in each phase of
installation described in one of the following topics:

• Installing Cluster Continuous Replication on Windows Server 2008

• Installing Cluster Continuous Replication on Windows Server 2003

Enhancements to CCR in Exchange 2007 SP1


Exchange 2007 SP1 includes several enhancements for CCR, including additional Exchange Management
Console user interface elements, improved status and monitoring, and improved performance.
Exchange Management Console Enhancements
Several new user interface elements have been added in Exchange 2007 SP1 that enhance the
management experience for high availability features, including CCR. These improvements include:

• Transport dumpster user interface A new Global Settings tab has been added to the Hub

Transport node under the Organization Configuration work area. This tab includes a
Transport Settings Properties page that can be used to configure the transport dumpster
settings for the organization:

• Maximum size per storage group (MB) Specifies the maximum size of the transport

dumpster queue for each storage group.

• Maximum retention time (days) Specifies how long an e-mail message should

remain in the transport dumpster queue.

• Continuous replication Additional user interface controls have been added to the Exchange

Management Console that enable an administrator to suspend, resume, update, and restore
continuous replication. These controls are the equivalent of using the following Exchange
Management Shell cmdlets:

• Suspend-StorageGroupCopy

• Resume-StorageGroupCopy

• Update-StorageGroupCopy

• Restore-StoreGroupCopy

You can use these cmdlets and the corresponding Exchange Management Console tasks to
manage continuous replication in both an LCR environment and in a CCR environment.
Status and Monitoring Enhancements
Exchange 2007 SP1 also introduces several changes that are designed to enhance the manageability of
Exchange 2007. These changes improve upon the cluster reporting features in Exchange 2007 RTM, and
include additional functionality designed for proactive monitoring of continuous replication environments.
Specifically, the changes and enhancements correct known deficiencies with the Get-
StorageGroupCopyStatus cmdlet, introduce a new cmdlet called Test-ReplicationHealth, and provide
greater visibility into the loss window covered by the transport dumpster.

Improvements to the Get-StorageGroupCopyStatus Cmdlet


In Exchange 2007 RTM, there are several conditions where the status reported by Get-
StorageGroupCopyStatus and the continuous replication performance counters is inaccurate or
misleading:

• A storage group that is not active (for example, not changing) can report its status as healthy

when it might not be healthy. This situation occurs because the unhealthy condition is not
detected until a log is replayed.

• During replication initialization, the replication status is being re-evaluated and may not be

accurate. When initialization completes, the status is updated.

• The value of the LastLogGenerated field can be wrong when the database in the storage group

is dismounted.

• When there are one or more missing logs in the middle of a log stream, the passive copy

continues to try to recover, causing the replication status to switch between failed and healthy
states. When this happens, the replay and copy queues continue to grow.
• Under rare conditions, a log can be successfully verified but still fail to replay. In this situation,

the system will alternate between failed and healthy states during its attempts to recover. When
this happens, the replay and copy queues continue to grow.

The Get-StorageGroupCopyStatus cmdlet has also been enhanced with the addition of new status
information for CCR environments:

• The Get-StorageGroupCopyStatus cmdlet reports a SummaryCopyStatus of ServiceDown

when the Microsoft Exchange Replication service on the target computer is not network
accessible.

• The Get-StorageGroupCopyStatus cmdlet reports a SummaryCopyStatus of Initializing when

the Microsoft Exchange Replication service on the target computer has not completed its initial
startup checks. A new performance counter has also been created to represent this status as a
Boolean.

• The Get-StorageGroupCopyStatus cmdlet reports a SummaryCopyStatus of Synchronizing

when it has not completed an incremental reseed.

The new states for the SummaryCopyStatus value are visible only when you use the Exchange 2007 SP1
version of the Exchange management tools. When you use the Exchange 2007 RTM version of the
Exchange management tools, the status for any of the preceding states will be reported as Failed.

Test-ReplicationHealth Cmdlet
Exchange 2007 SP1 introduces a new cmdlet called Test-ReplicationHealth. This cmdlet is designed for
proactive monitoring of continuous replication and the continuous replication pipeline. The Test-
ReplicationHealth cmdlet checks all aspects of replication, cluster services, and storage group replication
and replay status to provide a complete overview of the replication system. Specifically, when run on a
node in the cluster, the Test-ReplicationHealth cmdlet performs the tests described in the following
table.

Tests performed by the Test-ReplicationHealth cmdlet

Test Description

Cluster network Verifies that all cluster-managed networks found on the local node are running.
status This test is run only in a CCR environment.

Quorum group state Verifies that the cluster group containing the quorum resource is healthy. This
test is run only in a CCR environment.

File share quorum Verifies that the value of the FileSharePath used by the Majority Node Set
state quorum with file share witness is reachable. This test is run only in a CCR
environment.

Clustered mailbox Verifies that the clustered mailbox server is healthy by confirming that all
server group state resources in the group are online. This test is run only in a CCR environment.

Node state Verifies that neither of the nodes in the cluster is in a paused state. This test is
run only in a CCR environment.

DNS registration Verifies that all cluster-managed network interfaces that have Require DNS
status registration to succeed set have passed DNS registration. This test is run only
in a CCR environment.

Replication service Verifies that the Microsoft Exchange Replication service on the local computer is
status healthy.

Storage group copy Checks whether continuous replication has been suspended for any storage
suspended groups enabled for continuous replication.
Storage group copy Checks whether any storage group copies are in a Failed state.
failed

Storage group Checks whether any storage group has a replication copy queue length greater
replication queue than best practice thresholds. Currently, these thresholds are:
length

• Warning Queue length is 3–5 logs.

• Failure Queue length is 6 or more logs.

Databases Checks whether any databases are dismounted or failed after a failover has
dismounted after occurred. This test only checks for databases that have failed as a result of a
failover failover.
Performance Enhancements
Performance improvements have been made in Exchange 2007 SP1 that benefit high availability
deployments. These improvements include:

• I/O reductions on the disks containing passive copies of storage groups in continuous

replication environments In Exchange 2007 SP1, the design of the continuous replication
architecture has been modified so that the database cache is now persisted on the passive node
in between batches of log replay activity. The persistence of the database cache between batches
of log replay activity enables the Microsoft Exchange Replication service to leverage the database
caching features of the ESE, which in turn, reduces the amount of disk I/O that occurs on the
passive copy's logical unit numbers (LUNs). By contrast, in Exchange 2007 RTM, a new database
cache was created for each batch of log replay activity, which in some cases made the disk I/O
activity on the passive node as much as two or three times the disk I/O on the active node.

• Faster moving of clustered mailbox servers between nodes in a CCR

environment These improvements enable clustered mailbox servers to move between nodes in
two minutes or less. This includes moves performed by an administrator (using the Move-
ClusteredMailboxServer cmdlet), and failovers that are managed by the Cluster service. To
accomplish fast moves in a CCR environment, the databases are taken offline without flushing
the database cache. This behavior reduces the amount of downtime that occurs when the
clustered mailbox server is moved to another node.

Using Standby Continuous Replication with CCR


SCR is a new feature introduced in Exchange 2007 SP1. SCR extends the existing continuous replication
features and enables new data availability scenarios for Exchange 2007 Mailbox servers. SCR uses the
same log shipping and replay technology used by LCR and CCR to provide added deployment options and
configurations.

SCR enables you to use continuous replication to replicate Mailbox server data from a stand-alone Mailbox
server (with or without LCR), or from a clustered mailbox server in an SCC or in a CCR environment.

The process for activating copies of Mailbox server data that are created and maintained by SCR is manual
and is designed to be used when a significant failure occurs (and not for simple server outages that are
recoverable by a restart or some other quick means). You can activate an SCR target using database
portability, the server recovery option (Setup /m:RecoverServer), or if the Mailbox server is clustered,
the clustered mailbox server recovery option (Setup /RecoverCMS). The option you choose will be based
on your configuration and the type of failure that occurs.

For more information about SCR, see Standby Continuous Replication.

For More Information


The following topic areas discuss when and how to use CCR as part of a high availability and site resiliency
plan:
• Planning for Cluster Continuous Replication

• Planning Checklist for Cluster Continuous Replication

• Advantages of Cluster Continuous Replication over Single Copy Clusters

• Cluster Continuous Replication Resource Model

• Cluster Continuous Replication Recovery Behavior

• Scheduled and Unscheduled Outages

• Migrating Clusters That Are Running Previous Versions of Exchange

• Installing Cluster Continuous Replication on Windows Server 2003

• Installing Cluster Continuous Replication on Windows Server 2008

• Managing Cluster Continuous Replication

• How to Troubleshoot Cluster Continuous Replication Issues

Cluster Continuous Replication Resource Model


Applies to: Exchange Server 2007 SP3, Exchange Server 2007 SP2, Exchange Server 2007 SP1,
Exchange Server 2007

Topic Last Modified: 2006-11-29

Cluster continuous replication (CCR) uses the Cluster service to help manage a Mailbox server to
automatically recover from failures. When provided with redundant hardware, CCR can automatically
recover from several types of complete failures. For more information about recovery actions for different
kinds of failures, see Cluster Continuous Replication Recovery Behavior.

CCR uses a logical model, called the resource model, of the individual resources that make up a clustered
mailbox server, such as the Microsoft Exchange Information Store service and the databases maintained
by the clustered mailbox server. The resource model is represented as a tree to allow the software to
define an order of actions when a piece of the model starts or stops.

Databases, the Microsoft Exchange Information Store service, and other resources can be brought online
and offline manually by an administrator or automatically by the Cluster service. The term online generally
refers to the process of starting a service and mounting a database. The term offline generally refers to
the process of stopping a service or dismounting a database. The Cluster service tracks the online and
offline states of resources in the resource model.

The resources are used as a focal point for managing the processes, databases, and network identities
associated with the clustered mailbox server. The following is a brief summary of each resource type:

• Storage Group/Database Instance This resource represents a database that is hosted on the

clustered mailbox server. When this resource is online, the database is mounted. When this
resource is offline, the database is dismounted. The Microsoft Exchange Information Store
instance must be online before the database resource can come online.

• Exchange Information Store Instance This resource represents the clustered mailbox

server. When this resource is online, the Exchange Information Store service is started and
capable of accepting MAPI traffic. The clustered mailbox server may or may not have mounted
databases. When it is offline, the Microsoft Exchange Information Store service is stopped, and it
is not capable of accepting MAPI traffic.

• Exchange System Attendant Instance This resource represents the System Attendant

service for the clustered mailbox server. When this resource is online, the System Attendant
service is started. When this resource is offline, System Attendant service is stopped.
• Network Name This resource represents the network name of the clustered mailbox server.

When the network name is online, the name is associated with a network adapter on the
specified computer. When the network name is offline, the name is not associated with a network
adapter on the specified computer.

• IP Address This resource represents the IP address associated with the clustered mailbox

server. This IP address is bound to the clustered mailbox server network name in DNS. When the
IP address is online, it is associated with a network adapter on the specified computer. When the
IP address is offline, it is not associated with a network adapter on the specified computer.

Local Continuous Replication


Applies to: Exchange Server 2007 SP3, Exchange Server 2007 SP2, Exchange Server 2007 SP1,
Exchange Server 2007

Topic Last Modified: 2008-01-17

Local continuous replication (LCR) is a single-server solution that uses built-in asynchronous log shipping
and log replay technology to create and maintain a copy of a storage group on a second set of disks that
are connected to the same server as the production storage group. The production storage group is
referred to as the active copy, and the copy of the storage group maintained on the separate set of disks
is referred to as the passive copy. The following figure illustrates a basic deployment of LCR.

Basic deployment of LCR

LCR provides log shipping, log replay, and a quick manual switch (referred to as activation) to a secondary
copy of the data. LCR is designed to reduce the total cost of ownership for
Microsoft Exchange Server 2007 by:

• Reducing the recovery time for data-level disasters by enabling a quick switch to a second online

copy of the data.

• Reducing the number of regular full backups that are required for data protection. Data backups

are critical to have when a disaster strikes. Although LCR does not eliminate the need to take
backups, it does significantly reduce the need to take regular, daily full backups.

• Enabling you to offload Volume Shadow Copy Service (VSS) backups from the active copy of a

storage group to the passive copy of the storage group. All four VSS backup types (full, copy,
incremental, and differential) can be taken from the passive copy., Offloading the backups from
the active copy to the passive copy preserves valuable disk input/output (I/O) on the active
copy's logical unit numbers (LUNs).

LCR enables the configuration, operation, verification, removal, and activation of a storage group copy.
When necessary, a passive copy can be activated as a production database, and then mounted and made
available to clients. Typically, you can do this task as a configuration change either by changing the active
storage group and database paths or by a lower-level operating system action (for example, changing the
mount points associated with the log or database volumes).

LCR does not have any special storage requirements. Any type of storage that is supported by
Windows Server 2003 or Windows Server 2008 can be used with LCR, including direct attached storage,
serially attached SCSI, and Internet SCSI (iSCSI). For a list of certified storage solutions, see the Windows
Server Catalog of Tested Products.

LCR is an excellent option for customers that need fast recovery from mailbox data failure or corruptions
but can permit server outages for scheduled and unscheduled reasons. LCR provides:

• Rapid, two-step recovery from corruption or failure of a production database.

• Protection for the users that need it most.

• Minimal impact to production database and log disk I/O.

• The ability to offload VSS backups to the passive copy of the database and logs.

• The ability to reduce the total amount of data moved to backup media, while extending the

backup window.

• Administration available via the Exchange Management Console or the Exchange Management

Shell.

Enhancements to LCR in Exchange 2007 SP1


Microsoft Exchange Server 2007 Service Pack 1 (SP1) includes several enhancements for LCR, including
use of the transport dumpster, added Exchange Management Console user interface elements, improved
status and monitoring, and improved performance.

Transport Dumpster Enabled for LCR


The transport dumpster feature of the Hub Transport server role has been extended in Exchange 2007
SP1 to support LCR. In the release to manufacturing (RTM) version of Microsoft Exchange Server 2007,
the transport dumpster was available only for cluster continuous replication (CCR) environments. Unlike
CCR, in which the request for transport dumpster redelivery is an automatic part of the recovery process,
in an LCR environment, the process is manual. The Restore-StorageGroupCopy cmdlet has been
updated in Exchange 2007 SP1 to include the transport dumpster resubmission request. Thus, when an
administrator activates a passive copy of a storage group in an LCR environment using the Restore-
StorageGroupCopy cmdlet, the transport dumpster submission request occurs as part of the activation
process.

The transport dumpster takes advantage of the redundancy in the environment to reclaim some of the
data affected by the failover. Specifically, Hub Transport servers maintain a queue of recently delivered
mail. This queue is bound by the amount of time mail is kept and the total space used. New functionality
has been added to the Restore-StorageGroup task so that when an administrator uses that task to
activate the passive copy of a storage group, the Microsoft Exchange Replication service requests
redelivery of messages in the transport dumpster from each Hub Transport server in the Mailbox server's
site. The information store automatically deletes the duplicates and redelivers mail that was lost.

In Exchange 2007 SP1, the necessary condition for an e-mail message to be retained in the transport
dumpster is that it has at least one recipient whose mailbox is on a clustered mailbox server in a CCR
environment, or on a stand-alone server in a storage group that has been configured for LCR.

Situations in which data loss is not mitigated by the transport dumpster include:

• Drafts folder for any Microsoft Outlook clients in Online Mode.


• Appointments, contact updates, property updates, tasks, and task updates.

• Outgoing mail that is in transit from the client to the Hub Transport server. There is a period of

time during which the e-mail message only exists on the sender's Mailbox server.

For detailed steps about how to configure the transport dumpster settings, see How to Configure the
Transport Dumpster for Local Continuous Replication.

Exchange Management Console Enhancements


Several new user interface elements have been added in Exchange 2007 SP1that enhance the
management experience for high availability features, including LCR. These improvements include:

• Transport dumpster user interface A new Global Settings tab has been added to the Hub

Transport node under the Organization Configuration work area. This tab includes a
Transport Settings Properties page that can be used to configure the transport dumpster
settings for the organization:

• Maximum size per storage group (MB) Specifies the maximum size of the transport

dumpster queue for each storage group.

• Maximum retention time (days) Specifies how long an e-mail message should

remain in the transport dumpster queue.

• Manage continuous replication Additional user interface controls have been added to the

Exchange Management Console that enable an administrator to suspend, resume, update, and
restore continuous replication. These controls are the equivalent of using the following Exchange
Management Shell cmdlets:

• Suspend-StorageGroupCopy

• Resume-StorageGroupCopy

• Update-StorageGroupCopy

• Restore-StoreGroupCopy

You can use these cmdlets and the corresponding Exchange Management Console tasks to
manage continuous replication in both an LCR environment and in a CCR environment.
Status and Monitoring Enhancements
Exchange 2007 SP1 also introduces several changes that are designed to enhance the manageability of
Exchange 2007. These changes improve upon the cluster reporting features in Exchange 2007 RTM and
include additional functionality designed for proactive monitoring of continuous replication environments.
Specifically, the changes and enhancements correct known deficiencies with the Get-
StorageGroupCopyStatus cmdlet, introduce a new cmdlet called Test-ReplicationHealth, and provide
greater visibility into the loss window covered by the transport dumpster.

Improvements to the Get-StorageGroupCopyStatus Cmdlet


In Exchange 2007 RTM, there are several conditions where the status reported by Get-
StorageGroupCopyStatus and the continuous replication performance counters are inaccurate or
misleading:

• A storage group that is not active (for example, not changing) can report its status as healthy

when it might not be healthy. This situation occurs because the unhealthy condition is not
detected until a log is replayed.

• During replication initialization, the replication status is being re-evaluated and may not be

accurate. When initialization completes, the status is updated.


• The value of the LastLogGenerated field can be wrong when the database in the storage group

is dismounted.

• When there are one or more missing logs in the middle of a log stream, the passive copy

continues to try to recover, causing the replication status to switch between failed and healthy
states. When this happens, the replay and copy queues continue to grow.

• Under rare conditions, a log can be successfully verified but still fail to replay. In this situation,

the system will alternate between failed and healthy states during its attempts to recover. When
this happens, the replay and copy queues continue to grow.

The Get-StorageGroupCopyStatus cmdlet has also been enhanced with the addition of new status
information:

• The Get-StorageGroupCopyStatus cmdlet reports a SummaryCopyStatus of ServiceDown

when the Microsoft Exchange Replication service on the target computer is not network
accessible.

• The Get-StorageGroupCopyStatus cmdlet reports a SummaryCopyStatus of Initializing when

the Microsoft Exchange Replication service on the target computer has not completed its initial
startup checks. A new performance counter has also been created to represent this status as a
Boolean.

• The Get-StorageGroupCopyStatus cmdlet reports a SummaryCopyStatus of Synchronizing

when it has not completed incremental reseed.

The new states for the SummaryCopyStatus value are visible only when you use the Exchange 2007 SP1
version of the Exchange management tools. When you use the Exchange 2007 RTM version of the
Exchange management tools, the status for any of the preceding states will be reported as failed.

Test-ReplicationHealth Cmdlet
Exchange 2007 SP1 introduces a new cmdlet called Test-ReplicationHealth. This cmdlet is designed for
proactive monitoring of continuous replication and the continuous replication pipeline. The Test-
ReplicationHealth cmdlet checks all aspects of replication, Cluster services, and storage group
replication and replay status to provide a complete overview of the replication system. Specifically, when
run on a node in the cluster, the Test-ReplicationHealth cmdlet performs the tests described in the
following table.

Tests performed by the Test-ReplicationHealth cmdlet

Test Description

Cluster network Verifies that all cluster-managed networks found on the local node are running.
status This test is run only in a CCR environment.

Quorum group state Verifies that the cluster group containing the quorum resource is healthy. This test
is run only in a CCR environment.

File share quorum Verifies that the value of the FileSharePath used by the Majority Node Set
state quorum with file share witness is reachable. This test is run only in a CCR
environment.

Clustered mailbox Verifies that the clustered mailbox server is healthy by confirming that all
server group state resources in the group are online. This test is run only in a CCR environment.

Node state Verifies that neither of the nodes in the cluster is in a paused state. This test is run
only in a CCR environment.

DNS registration Verifies that all cluster-managed network interfaces that have Require DNS
status registration to succeed set have passed Domain Name System (DNS)
registration. This test is run only in a CCR environment.

Replication service Verifies that the Microsoft Exchange Replication service on the local computer is
status healthy.

Storage group copy Checks to see if continuous replication has been suspended for any storage groups
suspended enabled for continuous replication.

Storage group copy Checks to see if any storage group copies are in a Failed state.
failed

Storage group Checks to see if any storage group has a replication copy queue length greater
replication queue than best practice thresholds. Currently, these thresholds are:
length

• Warning Queue length is 3–5 logs

• Failure Queue length is 6 or more logs

Databases Checks to see if any databases are dismounted or failed after a failover has
dismounted after occurred. This test only checks for databases that have failed as a result of a
failover failover.
Performance Enhancements
Several performance improvements have been made in Exchange 2007 SP1 that benefit high availability
deployments. These improvements include I/O reductions on the disks containing passive copies of
storage groups in continuous replication environments. In Exchange 2007 SP1, the design of the
continuous replication architecture has been modified so that the database cache is now persisted for the
storage group copy in between instances of log replay activity. The persistence of the database cache
between instances of log replay activity enables the Microsoft Exchange Replication service to make use of
the database caching features of the Extensible Storage Engine (ESE), which in turn reduces the amount
of disk I/O that occurs on the passive copy's LUNs. By contrast, in Exchange 2007 RTM, a new database
cache was created for each batch of log replay activity, which in some cases made the disk I/O activity on
the passive LUNs as much as two to three times the disk I/O on the active LUNs.

Using Standby Continuous Replication with LCR


Standby continuous replication (SCR) is a new feature introduced in Exchange 2007 SP1. SCR extends the
existing continuous replication features and enables new data availability scenarios for Exchange 2007
Mailbox servers. SCR uses the same log shipping and replay technology used by LCR and CCR to provide
added deployment options and configurations.

SCR enables you to use continuous replication to replicate Mailbox server data from a stand-alone Mailbox
server (with or without LCR), or from a clustered mailbox server in a single copy cluster (SCC) or in a CCR
environment.

The process for activating copies of Mailbox server data that are created and maintained by SCR is manual
and is designed to be used when a significant failure occurs. The process is not meant to be used for
simple server outages that are recoverable by a restart or some other quick means. You can activate an
SCR target using database portability, the server recovery option (Setup /m:RecoverServer), or, if the
Mailbox server is clustered, the clustered mailbox server recovery option (Setup /RecoverCMS). The
option you choose will be based on your configuration and the type of failure that occurs.

For more information about SCR, see Standby Continuous Replication.

For More Information


The following topics discuss when and how to use LCR as part of a high availability and data resiliency
plan:

• Planning for Local Continuous Replication

• Managing Local Continuous Replication

• How to Troubleshoot Local Continuous Replication Issues

You might also like