Professional Documents
Culture Documents
Version 1.1a
Table of Contents
1
Reviewers: ........................................................................................................................................................................ 3
Change Control:.............................................................................................................................................................. 3
2
3
Overview ...................................................................................................................................... 4
System Configuration in GoldenGate-Data Guard installations ...................................... 4
Oracle Data Guard Protection levels .......................................................................................................................... 5
5
6
7
8
9
System clock.............................................................................................................................. 19
Conclusion ................................................................................................................................. 20
Appendix .................................................................................................................................... 21
Appendix A gg_11gie_ext_shared.sh .................................................................................................................... 21
Appendix B gg_11gie_ext_non_shared.sh ........................................................................................................... 22
Appendix E action.sh (ONLY for RAC) ............................................................................................................... 24
Document Acceptance
Reviewers:
Reviewer
Tracy West
Title
Consulting Member of
Technical Stuff
Email
Tracy.l.west@oracle.com
Change Control:
Date
2/29/2012
3/1/2012
5-1-12
Change
Example for externaljob.ora
update
Updated in line with the best
practices document for
"classic extract switchover /
failover with data guard."
Formatting corrections
Changed by
Sourav
Sourav
SGeorge
Page 3 of 24
Overview
Oracle GoldenGate is Oracle's strategic replication solution for data distribution and data
integration. Oracle Data Guard is Oracles strategic Disaster Recovery (DR) solution for the
Oracle Database. It is common for customers to deploy both capabilities within the same
configuration given the important role of each technology. For example, GoldenGate capture
processes can be deployed on a source database that is also a primary or a standby database in a
Data Guard configuration. Likewise, a database that is a target for GoldenGate replication can
also be a primary database protected by a Data Guard standby. This document provides best
practices for managing such a configuration when Data Guard role transitions are necessary
(switchover or failover operations where the primary and standby databases reverse roles).
As shown in Figure 1-1, data from a Data Guard primary database is replicated to a target
database by GoldenGate. Data Guard maintains synchronization of the physical standby database
used for disaster recovery. .. The only time that GoldenGate will use the standby database to
send data to the target database is after a Data Guard switchover or failover operation has
Page 4 of 24
promoted the standby database to the primary role. In this case, the application is moved to the
new primary database and it becomes the new source for the GoldenGate target.
Figure 1-1
Oracle Data Guard Protection levels
The following descriptions summarize the three levels of Data Guard protection.
Maximum Protection This protection level uses a synchronous replication process to ensure
that no data loss will occur if the primary database fails. It also enforces rules that prevent
multiple failure events from causing data loss. This protection level will never allow a primary
database to acknowledge commit success for an unprotected transaction.To provide this level of
protection, the redo data that is needed to recover each transaction must be written to both the
local online redo log and to the standby redo log on at least one standby database before Oracle
can acknowledge commit success to the application. To ensure that data loss cannot occur, the
primary database will shut down if a fault prevents it from writing its redo stream to the standby
redo log of at least one standby database.
Maximum Availability This protection level uses a synchronous replication process that
provides zero data loss protection without compromising the availability of the primary
database. Like Maximum Protection, commit success is not acknowledged to the application
Page 5 of 24
until the redo that is needed to recover that transaction is written to the local online redo log and
to the standby redo log of at least one standby database. Unlike Maximum Protection, however,
the primary database does not shut down if a fault prevents it from writing its redo stream to a
remote standby redo log. The primary database will stall for a maximum of net_timout seconds
(user configurable) before proceeding, in order to maintain availability of the primary database.
Data Guard automatically resynchronizes primary and standby databases when the connection is
restored. Data loss is possible if a second failure occurs before the resynchronization process is
complete.
Maximum performance This protection mode (the default) is an asynchronous replication
process that provides the highest level of data protection that is possible without affecting the
performance of the primary database. This is accomplished by acknowledging commit success
as soon as the redo data that is needed to recover that transaction is written to the local online
redo log without waiting for confirmation by the standby that the data is protected. The redo data
stream of the primary database is transmitted to the standby database directly from the Oracle inmemory log buffer as quickly as it is generated. .
Risks
When using any Data Guard protection level other than Maximum Protection, there is always the
risk of data loss whenever an unplanned failover occurs. . This risk transfers to the GoldenGate
target as well, because:
The Data Guard primary, the standby, and the GoldenGate target might all be at different
sequence numbers in the transaction stream.
The GoldenGate target database could be ahead or behind the Data Guard standby database in
time given that there is no way to insure that GoldenGate replication and Data Guard redo
transport are in lock-step.
Responsibilities
1. If using Maximum Protection or Maximum Availability, confirm that the failover was a
zero data loss failover. This is done before the failover is executed, by confirming the
values for PROTECTION_MODE and PROTECTION_LEVEL in V$DATABASE. If
the values both match (e.g. mode and level each report Maximum Availability), there is
no data loss.
2. If there was no data loss, then:
Once GoldenGate finishes processing the source data that has accumulated in the
trails (from the original primary database), it can start processing any new
Page 6 of 24
transactions from the new primary database (previously the standby database)
without any data loss.
This document outlines the steps that the shell program will perform in case of
switchover.
3. If you were using Maximum Performance, or if you were using Maximum Availability
and the PROTECTION_LEVEL in V$DATABASE says UNSYNCHRONIZED
(indicating that an earlier outage had impacted Data Guard transport) then the failover
will result in data loss (Data Guard was not able to transmit all committed transactions to
the standby database before the primary failed). If you can still access the original
primary after failover, determine how much and which data was lost, and how you want
to resolve the problem.
If the source and target databases are both Oracle databases, you can use
GoldenGate Veridata to identify the out-of-sync data.
This document outlines the steps that the shell program will perform in case of
failover.
In order to greatly simplify GoldenGate recovery it is strongly recommended you keep the
GoldenGate checkpoint files (in <GoldenGate home>/dirchk) as well as your trail files (at least
the most recent one(s)) on a storage device that can be accessed by the standby system.
Page 7 of 24
This section outlines a sample configuration where GoldenGate and Data Guard work
concurrently on the same systems. Following this section are discussions about the
typical administrative scenarios that are involved when using GoldenGate and Data
Guard in the same environment.
The following is the configuration that is used in the discussions:
-
Page 8 of 24
*****************************************************************************
$ (coe-01)[dgp] /home/oracle/profile\> vi $ORACLE_HOME/rdbms/admin/externaljob.ora
# NOTES
# For Porters: The user and group specified here should be a lowly privileged
#
user and group for your platform. For Linux this is nobody
#
and nobody.
# MODIFIED
# rramkiss 12/09/05 - Creation
#
##############################################################################
# External job execution configuration file externaljob.ora
#
# This file is provided by Oracle Corporation to help you customize
# your RDBMS installation for your site. Important system parameters
# are discussed, and default settings given.
#
# This configuration file is used by dbms_scheduler when executing external
# (operating system) jobs. It contains the user and group to run external
# jobs as. It must only be writable by the owner and must be owned by root.
# If extjob is not setuid then the only allowable run_user
# is the user Oracle runs as and the only allowable run_group is the group
# Oracle runs as.
run_user = oracle
run_group = oinstall
*****************************************************************************
For RAC after switchover/failover the action script can be fired by any node of the new
primary. But the action script works only on a designated server. To handle this situation
the action trigger would fire the "action.sh" script (appendix E ) which would SSH to the
designated server and would execute the actual failover script.
This is for the "Long distance switchover" scenario where we use Oracle DBFS file
system to store the GoldenGate binaries. The OS user (in our case "oracle") who actually
does the failover should have "sudo" access and the "/etc/sudoers" file should be updated
with "NOPASSWD:" option so that the script can unmount the DBFS file system in the
old primary server as part of the switchover/failover operation. Also the parameter
"Defaults requiretty" in the "/etc/sudoers" should be commented out (in both primary
and standby host) so that the remote shell can execute the "fusermount" via sudo.
#Defaults requiretty
oracle ALL=NOPASSWD: ALL
Page 9 of 24
Important! Do not follow these steps on your systems. They are only an outline of the steps that
should be considered as part of a switchover procedure. The procedures in this document cannot
be applied to individual environments without in-depth analysis of the systems, databases, and
applications involved. Make sure to test the approach you plan to use so that you are aware of all
the steps involved.
Source Database Switchover/Failover to Standby System
There are 2 different scenarios for the switchover/failover. The first scenario assumes that the
checkpoint files and trail files are available to the standby system. We call this scenario "shared
storage switchover/failover". The second scenario assumes that checkpoint and trail files are not
shared but they are installed in a DBFS file system switch is part of the primary database. We
call this scenario "long-distance switchover/failover".
Page 10 of 24
This scenario assumes checkpoint and trail files between source and standby database server are
or can be shared through the user of shared storage. Shared checkpoint files implies the same
name between the extract processes on primary and standby. The file structures (or if you use
relative notations for files, e.g. ./dirdat) relative to the GoldenGate home directory have to match
between the primary and the standby system. On Unix or Linux-based systems you can use soft
links to achieve this.
The scenario below assumes shared storage that shared storage is in place.
System setup:
System A (primary): Extract A ------> DataPump A-----> (remote) Replicat A
System B (standby): Extract A ------> DataPump A-----> (remote) Replicat A
During regular processing only the GoldenGate processes on source System A are active (not on
System B).
Page 11 of 24
Note that Extract A and Data Pump A on both systems share checkpoint files and trail files (and
the parameter files are identical except for maybe environment settings). Replicat A runs on the
target environment which is not involved in the switchover/failover.
#RAC
CREATE OR REPLACE TRIGGER failover_actions AFTER DB_ROLE_CHANGE ON DATABASE
DECLARE
role VARCHAR(30);
BEGIN
SELECT DATABASE_ROLE INTO role FROM V$DATABASE;
IF role = 'PRIMARY' THEN
dbms_scheduler.create_job (
job_name => '<schema_name>.<job_name>',
job_type => 'EXECUTABLE',
job_action => '<path>/action.sh',
enabled => TRUE);
END IF;
END;
The trigger will get executed in the event of a switchover or failover and the primary node will
create and execute a database job which will in turn run the shell program. (For RAC the script
"action.sh" will be executed by the trigger which will in turn login to the designated OGG server
and will execute the main action script.) The script will first determine if the "role change" is due
to switchover or a failover.
Page 12 of 24
Page 13 of 24
Page 14 of 24
Next we create the file system in tablespace by running the "dbfs_create_filesystem.sql" script as
the test user. The script accepts two parameters identifying the tablespace and file system name.
cd $ORACLE_HOME/rdbms/admin
sqlplus dbfs_user/dbfs_user
SQL> @dbfs_create_filesystem.sql dbfs_ts staging_area
The script created a partitioned file system. Although Oracle consider this the best option from a
performance and scalability perspective, it can have two drawbacks:
Space cannot be shared between partitions. If the size of the files is small compared to the
total file system size this is not a problem, but if individual files form a large proportion
of the total file system size, then ENOSPC errors may be produced.
File rename operations may require the file to be rewritten, which can be problematic for
large files.
If these issues present a problem to you, you can create non-partitioned file systems using the
"dbfs_create_filesystem_advanced.sql" script. In fact, the "dbfs_create_filesystem_advanced.sql"
script is called by the "dbfs_create_filesystem.sql" script, which defaults many of the advanced
parameters.
If we later wish to drop a file system, this can be done using the "dbfs_drop_filesystem.sql"
script with the file system name.
cd $ORACLE_HOME/rdbms/admin
sqlplus dbfs_user/dbfs_user
SQL> @dbfs_drop_filesystem.sql staging_area
FUSE Installation
In order to mount the DBFS we need to install the "Filesystem in Userspace" (FUSE) software.
If you are not planning to mount the DBFS or you are running on an Non-Linux platform, this
section is unnecessary. The FUSE software can be installed manually, from the OEL media or
via Oracle's public yum server. If possible, use the Yum installation.
Page 15 of 24
Page 16 of 24
0 Jan 6 17:02 ..
0 Jan 6 14:00 .sfs
To unmount the file system issue the following command from the "root" OS user.
# fusermount -u /mnt/dbfs
#RAC
CREATE OR REPLACE TRIGGER failover_actions AFTER DB_ROLE_CHANGE ON DATABASE
DECLARE
role VARCHAR(30);
BEGIN
SELECT DATABASE_ROLE INTO role FROM V$DATABASE;
IF role = 'PRIMARY' THEN
dbms_scheduler.create_job (
job_name => '<schema_name>.<job_name>',
job_type => 'EXECUTABLE',
job_action => '<path>/action.sh',
enabled => TRUE);
END IF;
END;
Page 17 of 24
The trigger will get executed in the event of a switchover or failover and the primary node will
create and execute a database job which will in turn run the shell program. (For RAC the script
"action.sh" will be executed by the trigger which will in turn login to the designated OGG server
and will execute the main action script.)The script will first determine if the "role change" is due
to switchover or a failover.
In this case the program will execute the following steps
1. Mount the DBFS file system in the new primary
2. Login to the old primary (from where the database switched over) and kill the manager,
pump process and the unmount the DBFS file system.
3. Start the GoldenGate manager, extract and extract pump in the new primary server.
Page 18 of 24
System clock
Page 19 of 24
Conclusion
The goal of this document was to share an approach that automates the failover
of OGG when used in a Fast Failover Data Guard environment. Please keep in
mind that while this solution has been well tested for a very basic use case you
most likely will need to modify it to meet your specific needs. Furthermore, since
our goal is to share a method, it is your responsibility to certify that any solution
you implement meets your specific business requirements as no warranty or
support can be provided for your implementation of this method.
When configuring GoldenGate in an environment that has either the source or target
databases protected by Data Guard, you must plan for the possibility of data loss should
a Data Guard failover occur. If there is data loss due to the failure of the primary
database, you need to decide how to handle the differences between the GoldenGate
source and the target databases.
Page 20 of 24
Appendix
Appendix A gg_11gie_ext_shared.sh
This appendix contains a Linux/Unix script that will perform switchover/failover of GoldenGate
extract/extract pump when triggered by "DB_ROLE_CHANGE" system event. This script is
designed to work in an environment where GoldenGate is installed in a shared file system
between the primary and the standby db server. Please change variables in the beginning of the
script to suite your environment.
Following is the script.
#!/bin/sh
#Set Variables
OGG_HOME=<GoldenGate home directory path>
FAL_NODE1=<primary node name>
FAL_NODE2=<standby node name>
PROFILE_NODE1=<primary node profile script name>
PROFILE_NODE2=<standby node profile script name>
NODE1_HOME=<primary node home directory where profile script resides>
NODE2_HOME=<standby node home directory where profile script resides>
extract=<extract name>
pump=<pump name>
V_WAIT_FOR_ARCHIVE=<time in seconds to wait for archivelog after
failover occurs>
#Set DB profile
v_host=`hostname`
if [ "$v_host" = "$FAL_NODE1" ]
then
cd $NODE1_HOME
. ./$PROFILE_NODE1
else
cd $NODE2_HOME
. ./$PROFILE_NODE2
fi
v_host=`hostname`
if [ "$v_host" = "$FAL_NODE1" ]
then
#Switchover steps
#Remote connection to stop mgr/pump in the failed node
ssh "$FAL_NODE2">/dev/null 2>&1 ". ./$PROFILE_NODE2;$OGG_HOME/ggsci
<<EOFF
stop $pump
stop mgr!
exit
EOFF"
sleep V_WAIT_FOR_ARCHIVE
$OGG_HOME/ggsci <<EOFF
start mgr
sh sleep 2
Page 21 of 24
start $extract
start $pump
exit
EOFF
exit 0
else
#Remote connection to stop mgr/pump in the failed node
ssh "$FAL_NODE1">/dev/null 2>&1 ". ./$PROFILE_NODE1;$OGG_HOME/ggsci
<<EOFF
stop $pump
stop mgr!
exit
EOFF"
sleep V_WAIT_FOR_ARCHIVE
$OGG_HOME/ggsci <<EOFF
start mgr
sh sleep 2
start $extract
start $pump
exit
EOFF
exit 0
fi
Appendix B gg_11gie_ext_non_shared.sh
This appendix contains a Linux/Unix script that will perform switchover/failover of GoldenGate
extract/extract pump when triggered by "DB_ROLE_CHANGE" system event. This script is
designed to work in an environment where GoldenGate is installed in a DBFS file system which
is a part of the primary database.
Note: Please dont forget to save the DBFS db user password in a text file and point to that file
with fully qualified path in the script. If you have security issues then use Oracle wallet
authentication.
#!/bin/sh
#Set Variables
OGG_HOME=<GoldenGate home directory path>
DBFS_MNT=<DBFS mount point>
FAL_NODE1=<primary node name>
FAL_NODE2=<standby node name>
PROFILE_NODE1=<primary node profile script name>
PROFILE_NODE2=<standby node profile script name>
NODE1_HOME=<primary node home directory where profile script resides>
NODE2_HOME=<standby node home directory where profile script resides>
extract=<extract name>
pump=<extract pump name>
TNS_NODE1=<tns connect string for primary db>
TNS_NODE2=<tns connect string for standby db>
DBFSUSER=<DBFS username>
syspassword=<Oracle sys user password for the target db>
#Password file should be created manually and should reside in
#the following directory
PASSWORDFILE_NODE1=$NODE1_HOME/<passwordfile name>
PASSWORDFILE_NODE2=$NODE2_HOME/<passwordfile name>
V_WAIT_FOR_ARCHIVE=<time in seconds to wait for archivelog after
failover occurs>
#Set DB profile and mount DBFS file system
v_host=`hostname`
if [ "$v_host" = "$FAL_NODE1" ]
then
cd $NODE1_HOME
. ./$PROFILE_NODE1
nohup dbfs_client $DBFSUSER@$TNS_NODE1 $DBFS_MNT <$PASSWORDFILE_NODE1
&
else
cd $NODE2_HOME
. ./$PROFILE_NODE2
nohup dbfs_client $DBFSUSER@$TNS_NODE2 $DBFS_MNT <$PASSWORDFILE_NODE2
&
fi
#Switchover/failover steps
echo "Switchover/failover steps"
v_host=`hostname`
if [ "$v_host" = "$FAL_NODE1" ]
then
#Remote connection to kill mgr/pump in the failed node
ssh "$FAL_NODE2">/dev/null 2>&1 "mgr_proc_id=\`ps -ef|grep
$OGG_HOME/dirprm/mgr.prm|grep -v grep|awk '{print \$2}'\`;
pump_proc_id=\`ps -ef|grep $OGG_HOME/dirprm/${pump}.prm|grep -v
grep|awk '{print \$2}'\`;kill -9 \$mgr_proc_id;kill -9
\$pump_proc_id;sudo fusermount -u -z $DBFS_MNT"
sleep $V_WAIT_FOR_ARCHIVE
$OGG_HOME/ggsci <<EOFF
start mgr
Page 23 of 24
sh sleep 2
start $extract
start $pump
exit
EOFF
else
#Remote connection to kill mgr/pump in the failed node
ssh "$FAL_NODE1">/dev/null 2>&1 "mgr_proc_id=\`ps -ef|grep
$OGG_HOME/dirprm/mgr.prm|grep -v grep|awk '{print \$2}'\`;
pump_proc_id=\`ps -ef|grep $OGG_HOME/dirprm/${pump}.prm|grep -v
grep|awk '{print \$2}'\`;kill -9 \$mgr_proc_id;kill -9
\$pump_proc_id;sudo fusermount -u -z $DBFS_MNT"
sleep $V_WAIT_FOR_ARCHIVE
$OGG_HOME/ggsci <<EOFF
start mgr
sh sleep 2
start $extract
start $pump
exit
EOFF
fi
Page 24 of 24