Professional Documents
Culture Documents
Conclusions
You dont need RAC to use Parallel Concurrent Processing (PCP)! If you have PCP enabled, secondary nodes must be defined during the upgrade to R12 Tuning of TCP, SQLNet and PMON parameters can minimize PCP failover time. Implement Failover Sensitive Workshifts
Definitions
CP => Concurrent Processing DCD => Dead Connection Detection ICM => Internal Concurrent Manager IM => Internal Monitor CRM => Conflict Resolution Manager PCP => Parallel Concurrent Processing PMON => Process Monitor for ICM
4
Concurrent Request
PCP Failover
DB Node RH8
Database
RH7
RH8
RH9
sqlnet.ora
PCP
PCP
SQL*Net Client
SQL*Net Client
SQL*Net Client
Concurrent Managers
Concurrent Managers
Manager Type Internal Concurrent Manager Conflict Resolution Manager Internal Monitor Concurrent Manager Concurrent Manager Concurrent Manager Concurrent Manager Transaction Manager Transaction Manager Transaction Manager Transaction Manager Service Instance Internal Manager Conflict Resolution Manager Internal Monitor:Node Service Manager: Node Standard Manager Inventory Manager Session History Cleanup PA Streamline Manager CRP Inquiry Manager FastFormula Transaction Manager PO Document Approval Manager Transaction Manager Scheduler/Prerelease Manager OAM Generic Collection Service:Node Program FNDLIBR FNDCRM FNDIMON FNDSM FNDLIBR INVLIBR FNDLIBR PALIBR CYQLIB FFTM POXCON FNDTMTST FNDSVC FNDSVC
9
Concurrent Processing
1. The Concurrent Web Processing server Interface Browser communicates with the database using Forms Server Oracle SQL*Net. JAVA 2. The concurrent JInitiator Interface program log or output Reports Server file from a request is passed back as a report to the Report SQL*Net ICM Service Internal Report Review Agent. FNDLIBR Manager Monitor Review FNDSM .rdx FNDIMON 3. The Report Review Agent Agent passes a file Standard Manager containing the entire Requests Log Out FNDCRM FNDLIBR report to the forms server. 4. The Forms Services component passes the report back to the users browser one page at time. Profile options can be used to control the size of the files and pages passed, to suit report volume and available network capacity.
HTML
Web Server
10
13
Service Manager
FNDSM process - Communicates with the Internal Concurrent Manager, Concurrent Manager, and non-Manager Service processes. The Service Manager (SM) spawns, and terminates manager and service processes (these could be Forms, or Apache Listeners, Metrics or Reports Server, and any other process controlled through Generic Service Management). When the ICM terminates the SM that resides on the same node with the ICM will also terminate. The SM is chained to the ICM. The SM will only reinitialize after termination when there is a function it needs to perform (start, or stop a process), so there may be periods of time when the SM is not active, and this would be normal.
14
Service Manager
All processes initialized by the SM inherit the same environment as the SM. The SMs environment is set by APPSORA.env file, and the gsmstart.sh script. The apps_<sid> listener must be active on each CP node to support the SM connection to the local instance. There should be a Service Manager active on each node where a Concurrent or non-Manager service process will reside.
15
FNDSM Failure
FNDSM failover as noted in the concurrent manager log:
Could not contact Service Manager FNDSM_RH8_VIS. The TNS alias could not be located, the listener process on RH8 could not be contacted, or the listener failed to spawn the Service Manager process. Found dead process: spid=(962754), cpid=(2259578), Service Instance=(1045) CONC-SM TNS FAIL Call to PingProcess failed for WFMAILER CONC-SM TNS FAIL Call to StopProcess failed for WFMAILER CONC-SM TNS FAIL Call to PingProcess failed for FNDCPGSC
16
FNDSM Failover
Found dead process: spid=(716870), cpid=(2259580), Service Instance=(2009) Found dead process: spid=(1442020), cpid=(2259579), Service Instance=(2010) Starting WFMGSMD Concurrent Manager : 15-AUG-2008 13:28:56 Starting WFMGSMDB Concurrent Manager : 15-AUG-2008 13:28:56 Starting WFALSNRSVCB Concurrent Manager : 15-AUG-2008 13:28:57 Starting STANDARD Concurrent Manager : 15-AUG-2008 13:30:31 Starting Internal Concurrent Manager Concurrent Manager : 15-AUG2008 13:30:32
17
Internal Monitor
(FNDIMON process) - Communicates with the Internal Concurrent Manager. This manager/service is used to implement Parallel Concurrent Processing. You do not need to run this manager/service unless you are using Parallel Concurrent Processing. The Internal Monitor (IM) monitors the Internal Concurrent Manager, and restarts any failed ICM on the local node. It monitors whether the ICM is still running, and if the ICM crashes, it will restart it on another node. During a node failure in a PCP environment the IM will restart the ICM on a surviving node (multiple ICM's may be started on multiple nodes, but only the first ICM started will eventually remain active, all others will gracefully terminate). There should be an Internal Monitor defined on each node where the ICM may migrate.
18
Standard Manager
(FNDLIBR process) - Communicates with the Service Manager and any client application process. The Standard Manager is a worker process that initiates, and executes client requests on behalf of Applications batch, and OLTP clients.
19
Standard Manager
20
The Standard Manager is active on RH9, even though no primary node is defined
Since no secondary node is defined, the Standard Manager will not failover Failover Processes in the Work Shifts definition are the number of processes that will run (3) when the Standard Manager fails over to the secondary node.
21
Transaction Manager
A Transaction Manger communicates with the Service Manager, and any user process initiated on behalf of Forms, or a Standard Manager request. A Transaction Manager: Supports synchronous processing of requests from a client program Gets request for a client program to run a server-side program synchronously. Return a status/results to the client program. At runtime, it starts a number of these managers as defined. Doesnt poll concurrent request table for a new request Only need 1 transaction manager per database, not 1 per instance.
22
Transaction Managers
23
Change the profile option Concurrent: TM Transport Type' to QUEUE', and verify that the transaction manager works across the RAC instance. ATG RUP3 (4334965) or higher provides an option to use AQs in place of Pipes. Profile Concurrent:TM Transport Type Set to QUEUE Pipes are more efficient but require a Transaction Manager to be running on each DB Instance. Navigate to Concurrent > Manager > Define screen, and set up the primary and secondary node names for transaction managers.
25
26
27
GSM
32
Generic Services
33
LINUX users should not Activate the Reports Server under GSM
35
Starting GSM
Apps Listener: listener.ora gsmstart.sh exec FNDSM
36
adcmctl.sh
adcmctl.sh calls: starmgr.sh batchmgr.sh CONCSUB FNDSVCRG
37
Verify GSM
To verify GSM is working, start the concurrent managers. Once GSM is enabled, the ICM uses Service Managers to start all concurrent managers and activated services. If the ICM is successfully starting the managers, then GSM has been configured properly. If managers and/or services fail to start, errors should appear in the ICM log file.
39
Kill FNDSM
applvis 9007 1 0 11:53 ? 00:00:00 FNDSM applvis 9159 9155 0 11:55 ? 00:00:00 FNDLIBR applvis 9161 5683 0 11:55 pts/3 00:00:00 grep FND [applvis@rh9 scripts]$ kill -9 9007 [applvis@rh9 scripts]$ ps -ef |grep FND applvis 9159 9155 0 11:55 ? 00:00:00 FNDLIBR applvis 9169 1 0 11:55 ? 00:00:00 FNDSM applvis 9249 5683 0 11:57 pts/3 00:00:00 grep FND
Kill FNDCRM [applvis@rh9 scripts]$ ps -ef |grep FNDCRM applvis 8886 1 0 11:52 ? 00:00:00 FNDCRM APPS/ZGA13053E1E1B7BA773417089054DA88F194EAC0D687728CC2551870E6B78C4B439 EADB287342795115A88DBC85788CCB4 FND FNDCRM N 10 c LOCK Y RH9 1302318 [applvis@rh9 scripts]$ kill -9 8886 [applvis@rh9 scripts]$ ps -ef |grep FNDCRM applvis 9457 9392 0 12:09 ? 00:00:00 FNDCRM APPS/ZG26430816FA3570354BC57DE47FF105D145F8DE226EFE58CE04B416633DCB90126 7BFECFA7585114F7090060EFE1147BE FND FNDCRM N 10 c LOCK Y RH9 1302343 Both of these services were started before I could enter the grep command to find the corresponding process. 41
In Release 11i, the Secondary Node doesnt need to be filled in for failover to occur
42
43
44
45
PCP Failover
DB Node RH8
Database
RH7
RH8
RH9
sqlnet.ora
PCP
PCP
SQL*Net Client
SQL*Net Client
SQL*Net Client
46
47
48
HTML
Interface
JInitiator
JAVA
Interface
SQL*Net
.rdx
Out
FNDCRM
Requests
SQL*Net
.rdx Database
Out
FNDCRM
Requests
Logs
50
reviver.sh DCD
53
Reviver
ICM REVIVER Start Starts to Shutdown No Receive Shutdown?
No Sleep
Yes No Spawn Reviver Yes Kill Previous DB Session ICM Started? Yes
No
From the CM log file: The ICM has lost its database connection and is shutting down. Spawning reviver process to restart the ICM when the database becomes available again. Spawned reviver process 10910.
Exit
54
reviver.log
The ICM has lost its database connection and is shutting down. Spawning reviver process to restart the ICM when the database becomes available again. Spawned reviver process 10910.
55
TCP
TCP/IP is a connection-oriented protocol; TCP implements packet timeout and retransmission in an effort to guarantee the safe and sequenced order of data packets. If a timely acknowledgement is not received in response to the probe packet, the TCP/IP stack will retransmit the packet some number of times before timing out. After TCP/IP gives up, SQL*Net receives notification that the probe failed.
56
TCP Keepalive
At this time, client side SQL*Net connections do not enable keepalive for TCP connections by default. However, it is possible to enable this by adding the ENABLE=BROKEN parameter to the SQL*Net connect string, by adding this parameter to the sqlnet.ora file. **WARNING** Keepalive intervals can typically be set to 2 hours or more (i.e,,it can take more than 2 hours to notice a dead server even if keepalive is enabled). To make keepalive useful for PCP and TAF the keepalive interval needs to be reduced to a smaller value (such as 2 minutes). If there are a lot of IDLE connections on your network, then reducing keepalive can increase network traffic significantly.
57
ENABLE=BROKEN
Sample TNS alias to enable keepalive (notice the ENABLE=BROKEN clause) VIS_BALANCE = (DESCRIPTION = (ENABLE=BROKEN) (ADDRESS_LIST = (LOAD_BALANCE = ON) (FAILOVER = ON) ADDRESS = (PROTOCOL = TCP) (HOST = rh8)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = rh6)(PORT = 1521)))
58
TCP Keepalive
**WARNING** Keepalive intervals are typically set to 2 hours or more (ie: it can take more than 2 hours to notice a dead server even if keepalive is enabled). To make keepalive useful for TAF, the keepalive interval would need to be reduced to a smaller value (such as 2 minutes). Note: 249213.1
59
Default Settings
TCP Keepalive
Initial Settings
tcp_keepalive_time = 200 secs tcp_keepalive_intvl = 20 tcp_keepalive_probes = 2
After 200 seconds of no response, TCP sends the first of 2 probes, 20 seconds apart. TCP notifies SQL*Net of the failure, and SQL*Net removes the offending connection.
61
TCP Retries
tcp_retries1 (default: 3) The number of times TCP will attempt to retransmit a packet on an established connection normally, without the extra effort of getting the network layers involved. tcp_retries2 (default: 15) The maximum number of times a TCP packet is retransmitted in established state before giving up tcp_syn_retries (default: 5) The maximum number of times initial SYNs for an active TCP connection attempt will be retransmitted. The default value is 5, corresponds to approximately 180 seconds.
62
TCP Retries
Now lets consider changing the following TCP parameters from their default values:
tcp_retries1 = 2 tcp_retries2 = 2 tcp_syn_retries = 2
In this example, the time to initialize the PCP failover was an average of 8 seconds after changing these TCP parameters.
63
PMON
Process monitor session started : 18-JAN-2009 22:38:57
CONC-SM TNS FAIL Call to PingProcess failed for OAMGCS 18-JAN-2009 22:38:58 - Node:(RH7), Service Manager:(FNDSM_RH7_VIS) currently unreachable by TNS Found dead process: spid=(11234), cpid=(1321563), ORA pid=(167), manager=(0/4)
PMON
Shutting down Internal Concurrent Manager : 18JAN-2009 22:02:01 18-JAN-2009 22:02:01 The ICM has lost its database connection and is shutting down. Spawning reviver process to restart the ICM when the database becomes available again. Spawned reviver process 10910.
70
71
72
73
ICM parameters are read from batchmgr.sh when adcmctl.sh runs. Changing these parameters here does not change batchmgr.sh!
74
$FND_TOP/bin/batchmgr.sh
Make sure the PMON changes are made in the $FND_TOP/bin/batchmgr.sh file. FILENAME # batchmgr # DESCRIPTION # fire up Internal Concurrent Manager process # USAGE # batchmgr arg1=val1 arg2=val2 ... # # Parameters may be sent via the environment. # # ARGUMENTS # [appmgr|sysmgr]=username/password # [sleep=sleep_seconds] # [mgrname=manager_name] # [logfile=log_filename] # [restart=N|mim minutes between restarts] # [mailto="user1 user2..."] # [PRINTER=printer_name] # [pmon=iterations] # [quesiz=pmon_iterations] # [diag=Y|N]
75
Reviver
ICM REVIVER Start Starts to Shutdown No Receive Shutdown?
No Sleep
Yes No Spawn Reviver Yes Kill Previous DB Session ICM Started? Yes
No
From the CM log file: The ICM has lost its database connection and is shutting down. Spawning reviver process to restart the ICM when the database becomes available again. Spawned reviver process 10910.
Exit
76
reviver.log
reviver.sh starting up... [ Mon Jan 12 20:02:15 MST 2009 ] - Read APPS username/password. [ Mon Jan 12 20:02:45 MST 2009 ] - Attempting database connection... [ Mon Jan 12 20:02:45 MST 2009 ] - Successful database connection. [ Mon Jan 12 20:02:45 MST 2009 ] - Killing previous ICM session... 1 row updated. Commit complete. [ Mon Jan 12 20:02:45 MST 2009 ] - Looking for a running ICM process... [ Mon Jan 12 20:02:45 MST 2009 ] - ICM now running, reviver.sh complete.
77
reviver.sh
reviver.sh code summary Sleep 30 Test_connection Kill_old _icm Get session Alter system kill session Check_running_icm Fnd_conc.ecm_alive start_icm startmgr.sh
78
79
Implement DCD
Implement by: adding SQLNET.EXPIRE_TIME = 1 (Minutes) to the sqlnet.ora file If the connection is idle for the time interval specified in minutes by the SQLNET.EXPIRE_TIME parameter, the serverside process sends a small 10-byte packet to the client. The packet is sent using TCP/IP.
80
If the ping fails, FNDIMON determines if its been over four pmon cycles since the ICM updated the work_start column fnd_concurrent_queues. If it has been more than four pmon cycles FNDIMON concludes the ICM is dead.
The DCD comes into picture here after ICM has crashed and DB needs to identify that the ICM is gone. The DB needs to clean up the dedicated server process resource corresponding to the ICM client process
82
Be aware that if a TCP failure is not detected, failover will not occur. The following except from a concurrent manager log shows:
fdpsrp() (running_processes correction): ICM cannot obtain exclusive lock on FND_CONCURRENT_QUEUES Oracle error code returned: 1 This message is information and does not indicate a problem with CP functionality. remote call function (FNDIMON) 15-AUG-2008 10:06:02 - Function to call: PingProcess
The PingProcess continues until the CP processes resume, or a TCP failure is detected, and failover is begun.
83
reviver.sh DCD
85
10 minutes
30 secs
200
20
15
1 minute
15 secs
200
20
15
285 secs / 35 min 8 secs / 105 secs 10 secs / 42 secs 7 secs / 40 secs 6 secs / 34 secs
4 4 4 4 2
60 60 20 20 20
10 10 2 2 2
3 2 2 2 2
15 2 2 2 2
5 2 2 2 2
87
88
Concurrent Managers
Processes - Actual = 1 and Target = 1, manager is running Processes - Actual = 0 and Target = 1, manager is running
89
Actual Processes = 0
90
PCP Setup
92
TCP Failure
TCP disconnected at 2:57:25 10 seconds after the TCP connection was pulled, OAM reported the status above. It took 10 seconds for OAM to register a failure of services on RH9.
93
CRM is DOWN
94
CRM Failure
95
Adding Node:(RH9), to unavailable list Found dead process: spid=(9696), cpid=(1321449), ORA pid=(80), manager=(0/0) Found dead process: spid=(9784), cpid=(1321458), ORA pid=(114), manager=(0/0) Found dead process: spid=(9783), cpid=(1321457), ORA pid=(104), manager=(0/0) Found running request 4413565 attached to dead manager process. Attempting to restart request. Internal Concurrent Manager found node RH9 to be down. Adding it to the list of unavailable nodes.
96
97
98
RH9 is DOWN
99
PCP is DOWN
100
The ICM and CRM failed over to RH7 in about 1 minute and 30 seconds
101
104
Request Failover
105
Note the Inventory Manager, MRP Manager and OAM Metrics Collection Manager are not setup to failover.
106
Note the Inventory Manager, MRP Manager and OAM Metrics Collection Manager are not setup to failover.
107
Failback
FAILBACK tcp connected at 31:40 The host, RH9 becomes available on OAM about 2 minutes later.
108
RH9 available
109
ICM Failback
110
112
Failback Complete
114
115
Shutdown of CP
117
118
When a node fails, the processes that were running on the failed node are restarted on secondary nodes. However, a resource intensive node may overload the secondary node when it fails-over.
119
120
121
122
Conversely, if a failover occurs from node 1 to node 2, we may want to reduce the failover processes, however, this doesnt work. Only if the node fails does the failover processes take effect.
123
Failover Processes
PO Document Approval Manager and the Standard Manager will reduce the number of processes when RH7 fails. When RH9 fails, the number of failover processes for managers that run on RH7 are not reduced.
124
References
249213.1 - Performance problems with Failover when TCP Network goes down 364171.1- TAF Session Hangs, Select Fails To Complete W/ Loss Of NIC: Tune TCP Keepalive 211362.1 - Process Monitor Session Cycle Repeats Too Frequently 291201.1 - How To Remove a Dead Connection to the Target Database 362135.1 - Configuring Oracle Applications Release 11i with Oracle10g Release 2 Real Application Clusters and Automatic Storage Management Optimizing the E-Business Suite with Real Application Clusters (RAC) - Ahmed Alomari 240818.1 - Concurrent Processing: Transaction Manager Setup and Configuration Requirement in an 11i RAC Environment R12 ATG - Concurrent Processing Functional Overview Aaron Weisberg 210062.1 - Generic Service Management (GSM) in Oracle Applications 11i 271090.1 - Parallel Concurrent Processing Failover/Failback Expectations 241370.1 - Concurrent Manager Setup and Configuration Requirements in an 11i RAC Environment 602899.1 - Some More Facts On How to Activate Parallel Concurrent Processing
126