You are on page 1of 17

RAC FAQS

What is cache fusion in RAC ?


What is the significance of freelists table setting in RAC ?
What is cluster_interconnect init parameter ? What value needs to be setup
here ?
Why do you need a seperate network switch for the RAC nodes interconnect
configuration?
What is the interconnect bandwidth you recommend for private connect
network switch ?
How many NICS you used in the RAC Install ?
What is local_listener setting ? Where do you set this parameter ? What is
the value should be ?
What is remote_listener setting ? Where do you set this parameter ? What is
the value should be ?
What is rconfig ? When do you use this ? Can you use this with raw devices ?
Where do you find this tool ?
What is ocrconfig ? When do you use this ?
Do we need $ORACLE_HOME on all RAC nodes ?
What is OCFS ? Did you used it before ? What is cluvfy ? When do you use
this ?
What is ASM ?
What is runcluvfy.sh ? cluvfy ? When do you use ?
What is racdiag.sql and RDA.sh ? What information it will provide ?
Did you ever noticed ORA-29740 ? When do you encounter this error ? How
do you solve this ?
what is cmcfg.ora ? What is ocmstart.sh ?
What is CRS ? Do you need clusterware when you use CRS ?
What are the various clusterwares available in the market ? Does 9iRAC
requires clusterware ?
What are the three different load balancing techniques you can use with
RAC ? Where do you set this up ?
What is gsdctl ? When do you use this ?
What is ORION tool ?
What is LPAR ? Did you ever worked with those ?
Which os requires hangcheck process ? What does it do ?
What is OCR/voting file ?
What is crsctl ? When do you use this tool ?
What is srvconfig_loc setting ? Where do you set this value ?
How do you initialize the raw volume where your SRVM configuration file is
located ? When do u use srvmconfig -init command ?
How do you start the global services daemon ? when do you use gsdctl start
command?
What is srvctl command ?
How do you list the number of instances running in a RAC environment ?
How do you start or stop RAC db ?
When do you use DBCA_RAW_CONFIG setting ? Where do you set this
setting ?

What is fal_server and fal_client ? Where do you need to set this ? What the
values would be ?
How do you setup physical standby and logical stand by ? Give me the steps
in detail ? What are the differences ?
What is oracle streams ? How do you enable the replication with streams ?
Give me the steps ?
How do you resolve/skip the data errors which comes with logical standby ?
How do you skip or exclude specific tables from replication in the logical
standby configuration ?
How do you identify the latency and issues with logical standby ?
How do you bypass the database guard and allow modifications to the
tables ?
Give me the Top 5 init parameters which needs to be setup in the source db
in the physical standby config ?
Give me the Top 5 init parameters which needs to be setup in the target db
in the physical standby config ?
How do you handle the nologging issues, If force logging is not enabled in
the physical standby environment ?
How do you ensure the 0% data loss in physical standby configuration ?
How do you perform the switch over from primary to standby and standby to
primary ?
What is the lock_name_space setting ? When do you need to use this ?
Give me the syntax to create a standby controlfile using RMAN ?
How do you identify the list of stand alone patches applied ?
How do you rollback a stand alone patches ?
Give me the list of steps, approach you take for patch set upgrade ?
Give me the list of steps, approach you take for database release
upgrades(9i-10g) ?
How do you apply the standby alone patches ?
How do you figure out the pre-req os patches/packages required for a
specific database release ?
How do you compile the steps for a database upgrade ?
How do you perform the cross platform migrations ?
How do you raise a service request with oracle when you can't resolve certain
upgrade issues

DATAGAURD INTERVIEW QUESTIONS:


1. Can Oracle's Data Guard be used on Standard Edition, and if so how? How
can you test that the standby database is in sync?
Oracle's Data Guard technology is a layer of software and automation built on
top of the standby database facility. In Oracle Standard Edition it is possible
to be a standby database, and update it *manually*. Roughly, put your
production database in archivelog mode. Create a hotbackup of the database
and move it to the standby machine. Then create a standby controlfile on the

production machine, and ship that file, along with all the archived redolog
files to the standby server. Once you have all these files assembled, place
them in their proper locations, recover the standby database, and you're
ready to roll. From this point on, you must manually ship, and manually
apply those archived redologs to stay in sync with production.
To test your standby database, make a change to a table on the production
server, and commit the change. Then manually switch a logfile so those
changes are archived. Manually ship the newest archived redolog file, and
manually apply it on the standby database. Then open your standby
database in read-only mode, and select from your changed table to verify
those changes are available. Once you're done, shutdown your standby and
startup again in standby mode.
2. What is the difference between Active Dataguard, and the Logical Standby
implementation of 10g dataguard?
Active dataguard is mostly about the physical standby.
Use physical standby for testing without compromising protection of the
production system. You can open the physical standby read/write - do some
destructive things in it (drop tables, change data, whatever - run a test perhaps with real application testing). While this is happening, redo is still
streaming from production, if production fails - you are covered. Use physical
standby for reporting while in managed recovery mode. Since physical
standby supports all of the datatypes - and logical standby does not (11g
added broader support, but not 100%) - there are times when logical
standby isnt sufficient. It also permits fast incremental backups when
offloading backups to a physical standby database.
3. What is a Dataguard?
Oracle Dataguard is a disaster recovery solution from Oracle Corporation that
has been utilized in the industry extensively at times of Primary site failure,
failover, switchover scenarios.
4. What are the uses of Oracle Data Guard?
a) Oracle Data Guard ensures high availability, data protection, and disaster
recovery for enterprise data.
b) Data Guard provides a comprehensive set of services that create,
maintain, manage, and monitor one or more standby databases to enable
production Oracle databases to survive disasters and data corruptions.

c) With Data Guard, administrators can optionally improve production


database performance by offloading resource-intensive backup and reporting
operations to standby systems.
5. What is Redo Transport Services?
It control the automated transfer of redo data from the production database
to one or more archival destinations.
Redo transport services perform the following tasks:
a) Transmit redo data from the primary system to the standby systems in the
configuration.
b) Manage the process of resolving any gaps in the archived redo log files
due to a network failure.
c) Automatically detect missing or corrupted archived redo log files on a
standby system and automatically retrieve replacement archived redo log
files from the
primary database or another standby database.
6. What is apply services?
Apply redo data on the standby database to maintain transactional
synchronization with the primary database. Redo data can be applied either
from archived redo log files, or, if real-time apply is enabled, directly from
the standby redo log files as they are being filled, without requiring the redo
data to be archived first at the standby database. It also allows read-only
access to the data.
7. What is difference between physical and standby databases?
The main difference between physical and logical standby databases is the
manner in
which apply services apply the archived redo data:
a) For physical standby databases, Data Guard uses Redo Apply technology,
which applies redo data on the standby database using standard recovery
techniques of
an Oracle database.
b) For logical standby databases, Data Guard uses SQL Apply technology,
which first transforms the received redo data into SQL statements and then
executes the
generated SQL statements on the logical standby database.

8. What is Data Guard Broker?


Data guard Broker manage primary and standby databases using the SQL
command-line interfaces or the Data Guard broker interfaces, including a
command-line interface (DGMGRL) and a graphical user interface that is
integrated in Oracle Enterprise Manager. It can be used to perform:
a) Create and enable Data Guard configurations, including setting up redo
transport services and apply services
b) Manage an entire Data Guard configuration from any system in the
configuration
c) Manage and monitor Data Guard configurations that contain Oracle RAC
primary or standby databases
d) Simplify switchovers and failovers by allowing you to invoke them using
either a single key click in Oracle Enterprise Manager or a single command in
the DGMGRL command-line interface.
e) Enable fast-start failover to fail over automatically when the primary
database becomes unavailable. When fast-start failover is enabled, the Data
Guard broker determines if a failover is necessary and initiates the failover to
the specified target standby database automatically, with no need for DBA
intervention.
9. What are the Data guard Protection modes and summarize each?
Maximum availability :
This protection mode provides the highest level of data protection that is
possible without compromising the availability of a primary database.
Transactions do not commit until all redo data needed to recover those
transactions has been written to the online redo log and to at least one
standby database.
Maximum performance :
This is the default protection mode. It provides the highest level of data
protection that is possible without affecting the performance of a primary
database. This is accomplished by allowing transactions to commit as soon as
all redo data generated by those transactions has been written to the online
log.
Maximum protection :
This protection mode ensures that no data loss will occur if the primary
database fails. To provide this level of protection, the redo data needed to
recover a transaction must be written to both the online redo log and to at
least one standby database before the transaction commits. To ensure that
data loss cannot occur, the primary database will shut down, rather than
continue processing transactions.

RMAN QUESTIONS:
1. What is RMAN ?
Recovery Manager (RMAN) is a utility that can manage your entire Oracle
backup and recovery activities.
Which Files must be backed up?
Database Files (with RMAN)
Control Files (with RMAN)
Offline Redolog Files (with RMAN)
INIT.ORA (manually)
Password Files (manually)
2. When you take a hot backup putting Tablespace in begin backup mode,
Oracle records SCN # from header of a database file. What happens when
you issue hot backup database in RMAN at block level backup? How does
RMAN mark the record that the block has been backed up ? How does RMAN
know what blocks were backed up so that it doesn't have to scan them
again?
In 11g, there is Oracle Block Change Tracking feature. Once enabled; this
new 10g feature records the modified since last backup and stores the log of
it in a block change tracking file. During backups RMAN uses the log file to
identify the specific blocks that must be backed up. This improves RMAN's
performance as it does not have to scan whole datafiles to detect changed
blocks.
Logging of changed blocks is performed by the CTRW process which is also
responsible for writing data to the block change tracking file. RMAN uses
SCNs on the block level and the archived redo logs to resolve any
inconsistencies in the datafiles from a hot backup. What RMAN does not
require is to put the tablespace in BACKUP mode, thus freezing the SCN in
the header. Rather, RMAN keeps this information in either your control files or
in the RMAN repository (i.e., Recovery Catalog).
3. What are the Architectural components of RMAN?
1.RMAN executable
2.Server processes
3.Channels
4.Target database
5.Recovery catalog database (optional)
6.Media management layer (optional)
7.Backups, backup sets, and backup pieces

4. What are Channels?


A channel is an RMAN server process started when there is a need to
communicate with an I/O device, such as a disk or a tape. A channel is what
reads and writes RMAN backup files. It is through the allocation of channels
that you govern I/O characteristics such as:
Type of I/O device being read or written to, either a disk or an sbt_tape
Number of processes simultaneously accessing an I/O device
Maximum size of files created on I/O devices
Maximum rate at which database files are read
Maximum number of files open at a time
5. Why is the catalog optional?
Because RMAN manages backup and recovery operations, it requires a place
to store necessary information about the database. RMAN always stores this
information in the target database control file. You can also store RMAN
metadata in a recovery catalog schema contained in a separate database.
The recovery catalog
schema must be stored in a database other than the target database.
6. What does complete RMAN backup consist of ?
A backup of all or part of your database. This results from issuing an RMAN
backup command. A backup consists of one or more backup sets.
7. What is a Backup set?
A logical grouping of backup files -- the backup pieces -- that are created
when you issue an RMAN backup command. A backup set is RMAN's name for
a collection of files associated with a backup. A backup set is composed of
one or more backup pieces.
8. What is a Backup piece?
A physical binary file created by RMAN during a backup. Backup pieces are
written to your backup medium, whether to disk or tape. They contain blocks
from the target database's datafiles, archived redo log files, and control files.
When RMAN constructs a backup piece from datafiles, there are a several
rules that it follows:
A datafile cannot span backup sets
A datafile can span backup pieces as long as it stays within one backup set
Datafiles and control files can coexist in the same backup sets
Archived redo log files are never in the same backup set as datafiles or
control files RMAN is the only tool that can operate on backup pieces. If you
need to restore a file from an RMAN backup, you must use RMAN to do it.
There's no way for you to manually reconstruct database files from the
backup pieces. You must use RMAN to restore files from a backup piece.

9. What are the benefits of using RMAN?


1. Incremental backups that only copy data blocks that have changed since
the last backup.
2. Tablespaces are not put in backup mode, thus there is noextra redo log
generation during online backups.
3. Detection of corrupt blocks during backups.
4. Parallelization of I/O operations.
5. Automatic logging of all backup and recovery operations.
6. Built-in reporting and listing commands.
PERFORMANCE TUNING
1. A tablespace has a table with 30 extents in it. Is this bad? Why or why
not?
Multiple extents in and of themselves aren't bad. However if you also have
chained rows this can hurt performance.
2. How do you set up tablespaces during an Oracle installation?
You should always attempt to use the Oracle Flexible Architecture standard or
another partitioning scheme to ensure proper separation of SYSTEM,
ROLLBACK, REDO LOG, DATA, TEMPORARY and INDEX segments.
3. You see multiple fragments in the SYSTEM tablespace, what should you
check first?
Ensure that users don't have the SYSTEM tablespace as their TEMPORARY or
DEFAULT tablespace assignment by checking the DBA_USERS view.
4. What are some indications that you need to increase the
SHARED_POOL_SIZE parameter?
Poor data dictionary or library cache hit ratios, getting error ORA-04031.
Another indication is steadily decreasing performance with all other tuning
parameters the same.
5. What is the general guideline for sizing db_block_size and
db_multi_block_read for an application that does many full table scans?
Oracle almost always reads in 64k chunks. The two should have a product
equal to 64 or a multiple of 64.
6. What is the fastest query method for a table?
Fetch by rowid
7. Explain the use of TKPROF? What initialization parameter should be turned
on to get full TKPROF output?
The tkprof tool is a tuning tool used to determine cpu and execution times for
SQL statements. You use it by first setting timed_statistics to true in the

initialization file and then turning on tracing for either the entire database via
the sql_trace parameter or for the session using the ALTER SESSION
command. Once the trace file is generated you run the tkprof tool against the
trace file and then look at the output from the tkprof tool. This can also be
used to generate explain plan output.
8. When looking at v$sysstat you see that sorts (disk) is high. Is this bad or
good? If bad, how do you correct it?
If you get excessive disk sorts this is bad. This indicates you need to tune the
sort area parameters in the initialization files. The major sort are parameter
is the SORT_AREA_SIZE parameter.
9. When should you increase copy latches? What parameters control copy
latches?
When you get excessive contention for the copy latches as shown by the
"redo copy" latch hit ratio. You can increase copy latches via the initialization
parameter LOG_SIMULTANEOUS_COPIES to twice the number of CPUs on
your system.
10. Where can you get a list of all initialization parameters for your instance?
How about an indication if they are default settings or have been changed?
You can look in the init.ora file for an indication of manually set parameters.
For all parameters, their value ad whether or not the current value is the
default value, look in the v$parameter view.
11. Describe hit ratio as it pertains to the database buffers. What is the
difference between instantaneous and total hit ratio; which should be used
for tuning?
Hit ratio is a measure of how many times the database was able to read a
value from the buffers verses how many times it had to re-read a data value
from the disks. A value greater than 80-90% is good, less could indicate
problems. If you take the ratio of existing parameters this will be a
***ulative value since the database started. If you do a comparison between
pairs of readings based on some arbitrary time span, this is the
instantaneous ratio for that time span. Generally speaking an instantaneous
reading gives more valuable data since it will tell you what your instance is
doing for the time it was generated over.
12. Discuss row chaining, how does it happen? How can you reduce it? How
do you correct it?
Row chaining occurs when a VARCHAR2 value is updated and the length of
the new value is longer than the old value and won't fit in the remaining
block space. This results in the row chaining to another block. It can be
reduced by setting the storage parameters on the table to appropriate
values. It can be corrected by export and import of the effected table.

13. When looking at the estat events report you see that you are getting
busy buffer waits. Is this bad? How can you find what is causing it?
Buffer busy waits may indicate contention in redo, rollback or data blocks.
You need to check the v$waitstat view to see what areas are causing the
problem. The value of the "count" column tells where the problem is, the
"class" column tells you with what. UNDO is rollback segments, DATA is data
base buffers.
14. If you see contention for library caches how can you fix it?
Increase the size of the shared pool.
15. If you see statistics that deal with "undo" what are they really talking
about?
Rollback segments and associated structures.
16. If a tablespace has a default pctincrease of zero what will this cause (in
relationship to the smon process)?
The SMON process won't automatically coalesce its free space fragments.
17. If a tablespace shows excessive fragmentation what are some methods
to defragment the tablespace? (7.1,7.2 and 7.3 only)
In Oracle 7.0 to 7.2 The use of the 'alter session set events 'immediate trace
name coalesce level ts#';' command is the easiest way to defragment
contiguous free space fragmentation. The ts# parameter corresponds to the
ts# value found in the ts$ SYS table. In version 7.3 the 'alter tablespace
coalesce;' is best. If free space isn't contiguous then export, drop and import
of the tablespace contents may be the only way to reclaim non-contiguous
free space.
18. How can you tell if a tablespace has excessive fragmentation?
If a select against the dba_free_space table shows that the count of a
tablespaces extents is greater than the count of its data files, then it is
fragmented.
19. You see the following on a status report:
redo log space requests 23
redo log space wait time 0
Is this something to worry about? What if redo log space wait time is high?
How can you fix this?
Since wait time is zero, no. If wait time was high it might indicate a need for
more or larger redo logs.
20. What can cause a high value for recursive calls? How can this be fixed?
A high value for recursive calls is cause by improper cursor usage, excessive
dynamic space management actions, and or excessive statement re-parses.
You need to determine the cause and correct it By either relinking
applications to hold cursors, use proper space management techniques

(proper storage and sizing) or ensure repeat queries are placed in packages
for proper reuse.
21. If you see a pin hit ratio of less than 0.8 in the estat library cache report
is this a problem? If so, how do you fix it?
This indicates that the shared pool may be too small. Increase the shared
pool size.
22. If you see the value for reloads is high in the estat library cache report is
this a matter for concern?
Yes, you should strive for zero reloads if possible. If you see excessive
reloads then increase the size of the shared pool.
23. You look at the dba_rollback_segs view and see that there is a large
number of shrinks and they are of relatively small size, is this a problem?
How can it be fixed if it is a problem?
A large number of small shrinks indicates a need to increase the size of the
rollback segment extents. Ideally you should have no shrinks or a small
number of large shrinks. To fix this just increase the size of the extents and
adjust optimal accordingly.
24. You look at the dba_rollback_segs view and see that you have a large
number of wraps is this a problem?
A large number of wraps indicates that your extent size for your rollback
segments are probably too small. Increase the size of your extents to reduce
the number of wraps. You can look at the average transaction size in the
same view to get the information on transaction size.
25. In a system with an average of 40 concurrent users you get the following
from a query on rollback extents:
ROLLBACK CUR EXTENTS
--------------------- -------------------------R01 11
R02 8
R03 12
R04 9
SYSTEM 4
You have room for each to grow by 20 more extents each. Is there a
problem? Should you take any action?
No there is not a problem. You have 40 extents showing and an average of
40 concurrent users. Since there is plenty of room to grow no action is
needed.
26. You see multiple extents in the temporary tablespace. Is this a problem?
As long as they are all the same size this isn't a problem. In fact, it can even
improve performance since Oracle won't have to create a new extent when a
user needs one.

RAC :
Some useful commands to manage Oracle 10g RAC
Check status of Oracle RAC :

$ srvctl status database -d RAC


Instance RAC1 is running on node orarac1
Instance RAC2 is running on node orarac2
$srvctl status instance -d RAC -i RAC1
Instance RAC1 is running on node orarac1
$srvctl status asm -n orarac1
ASM instance +ASM1 is running on node orarac1
$srvctl status nodeapps -n orarac1
VIP is running on node: orarac1
GSD is running on node: orarac1
PRKO-2016 : Error in checking condition of listener on node: orarac1
ONS daemon is running on node: orarac1
Please refer PRKO-2016 to Oracle Bug No.4625482
"CHECKING/STARTING/STOPING NAMED LISTENER WITH SRVCTL NODEAPPS
FAILS /W PRKO-2016". It's a known issue on Oracle 10.2.0.1. Ignore if the
listener works fine.
Stop/Start Oracle RAC:

1. Stop Oracle 10g on one of RAC nodes.


$ export ORACLE_SID=RAC1
$ srvctl stop instance -d RAC -i RAC1
$ srvctl stop asm -n orarac1
$ srvctl stop nodeapps -n orarac1
2. Start Oracle 10g on one of RAC nodes.
$ export ORACLE_SID=RAC1
$ srvctl start nodeapps -n orarac1
$ srvctl start asm -n orarac1
$ srvctl start instance -d RAC -i RAC1
3. Stop/start Oracle 10g on all RAC nodes.
$ srvctl stop database -d RAC
$ srvctl start database -d RAC

Check Oracle Listener on nodes:

$ srvctl config listener -n orarac1


orarac1 LISTENER_ORARAC1
$ lsnrctl start LISTENER_ORARAC1

Check, backup, restore OCR:

$ocrconfig -showbackup
$ocrconfig -export /data/backup/rac/ocrdisk.bak
!To restore OCR, it must stop Clusterware on all nodes before.
$ocrconfig -import /data/backup/rac/ocrdisk.bak
$cluvfy comp ocr -n all //verification
$ocrcheck

//check OCR disk usage

Check configuration of ORACLE RAC:

$srvctl config database


RAC
$srvctl config database -d RAC
orarac1 RAC1 /space/oracle/product/10.2.0/db_2
orarac2 RAC2 /space/oracle/product/10.2.0/db_2
$srvctl config asm -n orarac2
+ASM2 /space/oracle/product/10.2.0/db_2
$srvctl config nodeapps -n orarac1 -a -g -s -l
VIP exists.: /orarac1-vip.banctecmtl.lan/10.21.51.13/255.255.224.0/Public
GSD exists.
ONS daemon exists.
Listener exists.
SQL> column HOST_NAME format a10;
SQL> select INSTANCE_NAME,
HOST_NAME,STARTUP_TIME,STATUS,PARALLEL,DATABASE_STATUS,ACTIVE_
STATE from gv$instance;

INSTANCE_NAME HOST_NAME STARTUP_T STATUS


PAR
DATABASE_STATUS ACTIVE_ST
---------------- ---------- --------- ------------ --- ----------------- --------rac2
ORARAC2 16-SEP-08 OPEN
YES ACTIVE
NORMAL
rac1
ORARAC1 19-SEP-08 OPEN
YES ACTIVE
NORMAL
Using Oracle RAC Guard Commands for Planned Outages :
When planned outages occur, use pfsctl commands to shut down the instance
on the node where maintenance will be performed. The next procedure
contains general instructions for shutting down both the primary role node
and secondary role node. When one node is unavailable, Oracle RAC Guard
cannot perform failover.
Planned Outage on the Secondary Node
The procedure for using pfsctl commands for a planned outage on the
secondary node follows:
Stop the secondary instance. Use commands similar to the following:
# pfsctl
PFSCTL> stop_secondary
When the command completes you will see output similar to the following:
stop_secondary command succeeded.
Complete the desired maintenance on the secondary node.
Restore the secondary instance to the secondary node. Use the following
command:
PFSCTL> restore
The output is similar to:
restore command succeeded.
Planned Outage on the Primary Node
For a planned outage on the primary node use the following pfsctl
commands:
Move the primary role to the secondary instance. Use commands similar to
the following:
# pfsctl
PFSCTL> move_primary
When the command completes you will see output similar to the following:

move_primary command succeeded.


Complete the desired maintenance on the former primary node.
Restore the secondary role to the former primary node. Use the following
command:
PFSCTL> restore
When the command completes you will see output similar to the following:
restore command succeeded.
(Optional) Move the primary role back to the original primary node and
restore the secondary role to the original secondary node. Use the following
command:
PFSCTL> switchover
When the command completes you will see output similar to the following:
switchover command succeeded.

Configuring two-phase commit distributed transactions with Oracle RAC :

Real Application Cluster (RAC) configurations for Oracle 10g have an inherent
issue with the transaction manager when Oracle attempts to recover twophase commit distributed transactions that span over multiple Oracle RAC
nodes. A problem can occur when one node fails, and Oracle opens up the
other surviving nodes for business before the Oracle RAC completes the
necessary recovery action for the node that has failed. The application
server's ability to maintain transaction affinity provides you the ability to
circumvent this issue.
About this task
Errors can occur when the recovery process attempts to commit or rollback a
transaction branch through a RAC node that was previously active but later
failed. The transaction manager would receive the following exception:
ORA- 24756: transaction does not exist
If this error is encountered, the Oracle database administrator might need to
manually resolve the in-doubt transaction by forcing a rollback or commit
process. If you do not desire a manual intervention, however, you might want
to configure an automatic and transparent strategy for transaction recovery.
If the in-doubt transaction is not resolved, any subsequent transactions will
receive the following exception:
ORA-01591 lock held by in-doubt distributed transaction
The result is that portions of the database will not be usable.

The key to a transparent recovery strategy is to eliminate the possibility of a


global transaction spanning more than one transaction branch over multiple
RAC nodes. A transaction branch corresponds to a database connection that
is enlisted in a global transaction. If all connections in a global two-phase
commit transaction originate from the same node, transaction recovery
problems should not arise. Configure an Oracle RAC with the application
server to prevent errors with two-phase transactions.
The application server maintains transaction affinity for incoming
connections, and you can take advantage of this feature to configure
automatic recovery for Oracle RAC with two-phase commit transactions. If
you implement this configuration, all connections from a given application
server will be received from the same Oracle node, and the connections will
finish on that same node. This configuration will avoid situations in which
transactions span multiple nodes, and you should not experience a recovery
problem if one or more Oracle nodes go down.
Procedure
You can elect to manually resolve the in-doubt transaction.
Get the orphaned transaction ID. Issue the following command:
sql > select state, local_tran_ID, Global_tran_Id from dba_2pc_pending
where state = "prepared"
Roll back all of the transaction IDs that are in the prepared phase.
sql > rollback force '';
Configure an automatic strategy for transaction recovery.
Create an Oracle service that has only one primary node. Creating the
service with one primary node will ensure that load balancing is disabled. You
can also specify one or more alternate nodes with the -a parameter. Run this
command to create the service:
srvctl add service -d <database_name> -s <service_name> -r <primary
nodes> -a <alternate_nodes>
Enable Distributed Transaction Processing (DTP) on the Oracle service. DTP
was first introduced in Oracle 10gR2. Each DTP service is a singleton service
that is available on only one Oracle RAC instance. Run this command:
execute dbms_service.modify_service (service_name => '<service_name>' ,
dtp => true);
Configure each cluster member in the application server to use the Oracle
DTP service.
Results
If you configured an automatic recovery strategy, the DTP service will start
automatically on the preferred instance. However, if the database is
restarted, the DTP service will not start automatically. You can start the DTP
service using this command:
srvctl start service -d -s
If a RAC node stops working, Oracle will not failover the DTP service until the
Oracle RAC cleanup and recovery is complete. Even if the Oracle nodes come
back up, the Oracle DTP service will not return to the freshly restarted RAC

node. Instead, you will have to manually move the service to the restarted
RAC node.
When you configure DTP on the Oracle service, you have transferred load
balancing from the Oracle JDBC provider to the application server. The
workload will be distributed by the application server instead of Oracle, which
is why you created services that do not implement load balancing and only
use one primary node. This configuration prevents situations in which
transaction processes span multiple RAC nodes and alleviates recovery
problems that can arise when one or more RAC nodes fail.

You might also like