You are on page 1of 67

Oracle 10g Real Applications Cluster

Training

March 21,
TCS2017
Internal
Agenda

10g RAC
RAC Architecture Oracle Clusterware (CRS)
Clusters in general
CRS Components
Clusterterware and InterConnect
OCR, VIP
Storage SAN, NAS, NFS

RAC Theory RAC Administration


Memory Structures and Background
Initialization Params
Processes
Undo Management and Temp Space Mgmt
Cache Fusion Internals
Redo log and backup considerations
Workload Balancing
Tools/Utilities for CRS, 10g RAC
Scaling RAC databases
CRS Installation Example

2 March 21, 2017


Real Applications
Cluster

3 March 21, 2017


What is a Cluster?

A group of independent computers working


together as a single system

Availability - Continues running in case of a hardware or


software failure
Scalability - New nodes can be added to a cluster to
accommodate increased workload
Performance - Workload can be distributed among nodes for
optimal performance
Requires
hardware (interconnect)
software (clusterware)

4 March 21, 2017


Key Terminology

Shared memory-Shared disk


Shared nothing
Active/passive
Active/active

SAN/NAS
NIC
Switch
Interconnect
Clusterware

5 March 21, 2017


Real Application Clusters: What is it?

Multiple Oracle instances running on multiple nodes and sharing a


single physical database
All instances have common data, control, and initialization files
Each instance has individual shared log files and rollback segments
or undo tablespaces
All instances can simultaneously execute transactions against the
single database
Caches are synchronized using Oracles Global Cache Management
technology (Cache Fusion)

8 March 21, 2017


Real Application Clusters

Centralized Network Users


Management
Console

Low Latency Interconnect No Single


High Speed VIA or Proprietary
Switch and
Point Of Failure
Interconnect

Clustered
Database Servers

Hub or
Switch Storage Area Network
Fabric
Drive and Exploit
Mirrored Disk Industry Advances in
Subsystem Clustering

9 March 21, 2017


Real Application Clusters Architecture

10 March 21, 2017


RAC Hardware Considerations

InterConnect is typically a standard GigE network


UDP (default)
LLT
IP over InfiniBand is supported. Recently RDS over IB
Private Network should use a dedicated non-routable switch or VLAN
For high availability and scalability use OS based solution to combine
multiple physical links into a single logical link
Raw disk or Filesystems?
Performance vs Ease-of-Use
Direct I/O : best of both worlds?
Filesystem single writer lock bottleneck
Concurrent Direct I/O

11 March 21, 2017


What is the Interconnect?
Instances communicate with each other over the interconnect (network)
Information transferred between instances includes
data blocks
locks
SCNs
Typically 1GB Ethernet
UDP protocol
Often teamed in pairs to avoid SPOFs
Can also use Infiniband
Lower latency
Higher band-width
Costlier

Other proprietary protocols are available

12 March 21, 2017


Why Use Shared Storage?
Mandatory for
Database files
Control files
Online redo logs
Server Parameter file (if used)
Optional for
Archived redo logs (recommended)
Executables (Binaries)
Password files
Parameter files
Network configuration files
Administrative directories
Alert Log
Dump Files

13 March 21, 2017


What Shared Storage is Supported?

Oracle supplied options


Oracle Cluster File System (OCFS)
Version 1
Windows and Linux
Supports database and archived redo logs
No executables
Version 2 - August 2005
Linux, Windows and Solaris
As OCFS1 plus executables
Automatic Storage Management (ASM)
Oracle 10.1 and above
More transparent in Oracle 10.2 and above
Both require underlying SAN or NAS
Do not require LVM

14 March 21, 2017


What Shared Storage is Supported?
Can use (continued)
Network Attached Storage
NFS-based
Potentially lower cost - no fibre channel required
Easy to administer
Raw devices
Difficult to administer
Cannot be used with archived redo logs
Third-party Cluster File System
Still a popular choice with many sites
Others (not supported)
Firewire - maximum two nodes - recommended in 10g
NBD - Network Block Devices - Solaris and Linux
NFS - not supported, but might still work

15 March 21, 2017


Cache Fusion Architecture

Full Cache Fusion


Cache-to-cache data shipping
Shared cache eliminates slow I/O
Enhanced IPC Users
Allows flexible and transparent deployment

Cache Fusion

ha r ed C a che
S

17 March 21, 2017


Oracle RAC Cache Fusion What it does

3. Cache Transfer
GES&GCS Block A (Read) GES&GCS
4. Update Block A

...
Shared Memory/Global Area Shared Memory/Global Area

5. Update Block A
Block
UpdatedAA
shared log shared log
SQL buffer SQL buffer

6. Cache Transfer
Block A (Update)
2. Read Block A
1. Read Block A

Shared Disk Database

18 March 21, 2017


Cache Fusion: Inter Instance Block Requests

Readers and writers accessing


instance A gain access to Request Lock Status
blocks in instance Bs for Block Block in
buffer cache Cache A Cache B
All types of block contention
and access Read Read
Coordination by Global
Cache/Enqueue Services
Read Write
Write Read
Write Write

19 March 21, 2017


Cache Fusion: Instance Under the hood

Cluster Private High Speed Network

LMON LMD0 DIAG LMON LMD0 DIAG LMON LMD0 DIAG

Global Resource Directory Global Resource Directory Global Resource Directory


Instance 1 Instance 2 Instance n
Dictionary Log buffer Dictionary Log buffer Dictionary Log buffer
Cache Cache Cache
SGA SGA SGA
Library Buffer Cache Library Buffer Cache Library Buffer Cache
Cache Cache Cache

LCK0 LGWR DBW0 LCK0 LGWR DBW0 LCK0 LGWR DBW0

LMS0 SMON PMON LMS0 SMON PMON LMS0 SMON PMON

Node 1 Node 2 Node n

Redo Log Files Redo Log Files Redo Log Files

Data Files and Control Files

20 March 21, 2017


Cache Fusion Details: GES & GCS

Global Enqueue Service (GES)


Co-ordinates the requests of all global enqueues (any non-buffer cache
resources)
Deadlock detection and Timeout of requests
Manages resource caching/cleanup

Global Enqueue Service (GES)


Guarantees cache coherency
Manages caching of shared data via Cache Fusion
Minimizes access time to data which is not in local cache and would
otherwise be read from disk or rolled back
Implements fast direct memory access over high-speed interconnects for
all data blocks and types
Uses an efficient and scalable messaging protocol
Maintains block mode for blocks with Global role
Responsible for block transfers between instances

21 March 21, 2017


Cache Fusion: Global Resource Directory

The data structures associated with global resources


Global Cache Services and Global Enqueue Services maintain the
Resource Directory
Distributed across all instances in a cluster
Responsible for:
Maintaining the mode and role of cached database blocks
Maintaining block copies for recovery purposes (past images)

22 March 21, 2017


Cache Fusion Details: Instance Processes

Role of LMON:
Check for instance transition
Reconfiguration
Cleaning up of Cached Enqueue Resources

Role of LMD:
Receive and Process GES messages
Deadlock Detection and Request Timeout

Role of LMSn (0-9) Higher in 10gR2


Receive and Process GCS messages
Buffer Cache Operations & Transfers

23 March 21, 2017


Cache Fusion Details: Resource Modes

3 Resource Modes for global cache resources


(cached database blocks)
S shared used for blocks read into cache any number of instances can
hold blocks in S mode
X exclusive used for blocks updated in cache only 1 instance can have a
block with X mode
N null used for blocks not currently in cache

24 March 21, 2017


Cache Fusion Details: Resource Roles

2 Resource Roles for global cache resources


L local block can be manipulated by instance without further global
requests
Block can be held in X, S, or Null mode
Block can be served to other instances
G global block manipulation needs further instance coordination
Blocks can be dirty on many nodes
Instances can use a global status for consistent read when held in X mode
by another instance

25 March 21, 2017


Cache Fusion Details: Past Images

Only applicable to blocks with the Global Resource roles


Copy of dirty block when the block is transferred to another
instance
Used for recovery purposes if necessary
Maintained until it, or later version is written to disk

26 March 21, 2017


A Cache Transfer Example write/write

inst1 inst2
Instance 2 holds block in
XL0 X Local mode with 0 past
images (XL0) it has been
New updated (New)

This means that


instance 2 was updating
the block while no other
instances were using this
block

The Old block is still on


Old
disk (no PI needed)

27 March 21, 2017


A Cache Transfer Example - write/write
Instance 1 requests the
inst1 inst2 block for exclusive access
(update)
XG0 XL0 -> NG1
GCS instructs instance 2
New PI
to downgrade the block
and transfer it to instance
1

Instance 2 resource is
now Null, Global with 1
past image (NG1)

Old
Instance 1 now puts the
block in X mode for the
update

28 March 21, 2017


RAC Instance Crash Recovery

Single Instance Crash recovery involves at a high level :


Read from redo logs to identify blocks needing recovery
Start with the block on disk
Apply Changes present in redo logs

RAC Instance crash recovery is a bit more involved :


Surviving instances realize recovery is needed
GRD is frozen
Resource re-mastering is done
Identify all blocks to be recovered from redo thread
Check for PI(s) of the block
Start recovering blocks that need recover (Release other blocks, unfreeze GD)
Write blocks as soon as recovered

29 March 21, 2017


10g RAC Architecture

30 March 21, 2017


What does Clusterware give?

31 March 21, 2017


10g Clusterware Architecure

32 March 21, 2017


Clusterware: Group Membership and Heartbeats

Cluster Needs to know who is a member at all times

Oracle Clusterware has 2 Heartbeats

Network Heartbeat - If a node does not send a Haertbeat MissCount (time


in seconds), then node is evicted from cluster. MissCount less than 30s is not
supported

Disk Heartbeat - If a disk is not updated in I/O timeout, then node is


evicted from cluster

33 March 21, 2017


What is a Voting Disk?

Known as Quorum Disk / File in Oracle 9i

Located on shared storage accessible to all instances

Used to determine RAC instance membership

In the event of node failure voting disk is used to determine which instance takes
control of cluster
Avoids split brain

In Oracle 10.2 and above can be mirrored


Odd number of copies (1, 3, 5 etc)

34 March 21, 2017


Clusterware: Split Brain Resolution

What is Split Brain ?

When InterConnect breaks Keeps the largest cluster possible up, other
nodes will be evicted, in 2 node cluster lowest number node remains

IO Fencing similar to STONITH algorithm

Voting disk is used to detect network problems that could lead to split-
brain
Final arbiter of status of configured nodes, either up or down, and delivers
eviction notices

Recommend to have at least 3 voting disks

Standard NFS support now available for 3rd voting disk on Linux, AIX, HP and
Solaris

35 March 21, 2017


Oracle Cluster Registry (OCR)

A repository containing the definition of the configuration of the cluster and status
of resources managed by the cluster
RAW or Cluster File system (Veritas Storage Foundation, Sun Cluster, OCFS in linux)
Required file for Oracle Clusterware
Initialized during clusterware installation
Location defined in ocr.loc (In windows the location is in the registry)
Mirrored by Oracle Clusterware or externally (RAID)
Automatically backed up by Oracle Clusterware (Manual backup available in 11g)

Tools to manage

OCRCONFIG To manage restore, import, export, repair, replace


OCRCHECK Check integrity of OCR and displays OCRs block format, total space
available, used space, and the locations you have configured

OCRDUMP Dumps the contents of the OCR into a standard ASCII text file.

36 March 21, 2017


What is VIP?
Node application introduced in Oracle 10.1

Allows Virtual IP address to be defined for each node

All applications connect using Virtual IP addresses

If node fails Virtual IP address is automatically relocated to another node

Only applies to newly connecting sessions

37 March 21, 2017


What is TAF?
TAF is Transparent Application Failover

Sessions connected to a failed instance will be terminated


Uncommitted transactions will be rolled back

Sessions can be reconnected to another instance automatically if using TAF


Can optionally re-execute in-progress SELECT statements
Statement re-executed with same SCN
Fetches resume at point of failure
Session state is lost including
Session parameters
Package variables
Class and ADT instantiations

38 March 21, 2017


What is TAF?
TAF is Transparent Application Failover

Requires additional coding in client

Requires configuration in TNSNAMES.ORA

RAC_FAILOVER =
(DESCRIPTION =
(ADDRESS_LIST =
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = RAC)
(SERVER = DEDICATED)
(FAILOVER_MODE =(TYPE=SELECT)(METHOD=BASIC)(RETRIES=30)(DELAY=5))
)
)

39 March 21, 2017


RAC Connection Balancing

CPU Load Instance


Shared Memory/Global Area

shared log
SQL buffer

Listener

Shared Memory/Global Area

Client
shared log

Listener
SQL buffer

CPU Load Instance Cluster


Database
Instances register with listeners when started
Nodes report CPU usage back to registered listeners (SMON)
Listener chooses least used node when connection needed
Supports both Shared Server and Dedicated Server configurations

42 March 21, 2017


RAC Workload Balancing: Parallel Query

Node 1 Node 2 Node 3 Node 4

Query coordinator
Query Slaves have node affinity for
query coordinator but will expand if Parallel query
needed execution

Queries and DML can utilize all nodes in cluster


Parallel-aware Optimizer spreads single queries across instances when
applicable
Optimizer may also choose to use only one node to satisfy request
(Adaptive Parallel Query)
Once started, all nodes work until entire operation is completed

43 March 21, 2017


What is Workload Balancing?
Balancing of workload across available instances
Can have
Client-side connection balancing
Server-side connection balancing

Client-side connection balancing


Workload distributed randomly across nodes

RAC =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521))
(LOAD_BALANCE = ON)
(FAILOVER = ON)
(CONNECT_DATA =
(SERVICE_NAME = RAC)
(FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC))
)
)

44 March 21, 2017


What is Workload Balancing?
Server-side connection balancing
Dependent on current workload on each node
PMON monitors workload and updates listeners
Depends on long or short connections

In Oracle 10.1
Set PREFER_LEAST_LOADED_NODE in listener.ora
OFF for long connections
ON for short connections (default)

In Oracle 10.2
Can specify load balancing goal for each service
NONE, SERVICE_TIME or THROUGHPUT
Can also specify connection load balancing goal
SHORT or LONG

45 March 21, 2017


Increasing Scalability for your RAC
If application scales well on a single-instance then it should scale well on RAC

Eliminate contention
Use sequence caching
Reverse key indexes

Use locally partitioned tables and indexes


Attempt to achieve node affinity

Avoid contention for single blocks


Distribute rows for hot blocks
Small block size e.g. 2048 or 4096
High PCTFREE / Low PCTUSED
Filler columns e.g. CHAR (2000)

46 March 21, 2017


Increasing Scalability for your RAC
Use Automatic Segment Space Management
Default in Oracle 10.2

Use larger block size for read-only objects


Reduce number of GCS messages required

Minimize lock usage


Eliminate unnecessary parsing
Increase size of shared pool
Bind variables
Cursor sharing

Use optimistic locking


Eliminate unnecessary SELECT FOR UPDATE statements

47 March 21, 2017


Administration

48 March 21, 2017


Managing CRS
crsctl as root

crsctl start crs


crsctl stop crs
crsctl check crs

Starting and Stopping during OS startup/shutdown (Preferred method)

CRS places start/stop script on an OS dependent location


Solaris: /etc/init.d/init.crs
The OS has to schedule this startup during the system
startup. Unix admins.

Log file locations for various CRS processes (Check metalink for latest version
changes)

$CRS_HOME/log/<nodename>/<process>
process = css/crs/evm/racg
(or) use $CRS_HOME/bin/diagcollect.pl to collect all logs

49 March 21, 2017


Example: CRS Processes

50 March 21, 2017


Example: CRS Status crs_stat

51 March 21, 2017


Background Processes in a RAC instance

52 March 21, 2017


RAC related Background Processes

53 March 21, 2017


Server Parameter File
Introduced in Oracle 9.0.1
Must reside on shared storage
Shared by all RAC instances
Binary (not text) files
Parameters can be changed using ALTER SYSTEM
Can be backed up using the Recovery Manager (RMAN)
Created using

CREATE SPFILE [ = SPFILE_NAME ]


FROM PFILE [ = PFILE_NAME ];

init.ora file on each node must contain SPFILE parameter

SPFILE = <pathname>

54 March 21, 2017


Parameters
RAC uses same parameters as single-instance
Some must be different on each instance
Some must be same on each instance

Can be global or local

[*.]<parameter_name> = <value>
[<sid>]<parameter_name> = <value>

Must be set using ALTER SYSTEM statement

ALTER SYSTEM SET parameter = value


[ SCOPE = MEMORY | SPFILE | BOTH ]
[ SID = <sid>]
ALTER SYSTEM RESET parameter = value
[ SCOPE = MEMORY | SPFILE | BOTH ]
[ SID = <sid>]

55 March 21, 2017


Parameters
Some parameters must be same on each instance including *:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
CLUSTER_DATABASE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
MAX_COMMIT_PROPAGATION_DELAY
TRACE_ENABLED
UNDO_MANAGEMENT
* Correct for Oracle 10.1

56 March 21, 2017


Parameters
Some parameters, if used, must be different on each instance including
THREAD
INSTANCE_NUMBER
INSTANCE_NAME
UNDO_TABLESPACE
ROLLBACK_SEGMENTS

DML_LOCKS must be identical on each instance if set to zero

57 March 21, 2017


What is SRVCTL?
Utility used to manage cluster database
Configured in Oracle Cluster Registry (OCR)
Controls
Database
Instance
ASM
Listener
Node Applications
Services
Options include
Start / Stop
Enable / Disable
Add / Delete
Show current configuration
Show current status

58 March 21, 2017


SRVCTL - Examples
Starting and Stopping a Database

srvctl start database -d RAC


srvctl stop database -d RAC

Starting and Stopping an Instance

srvctl start instance -d RAC -i RAC1


srvctl stop instance -d RAC -i RAC1

Starting and Stopping a Service

srvctl start service -d RAC -s SERVICE1


srvctl stop service -d RAC -s SERVICE1

Starting and Stopping ASM on a specified node

srvctl start asm -n node1


srvctl stop asm -n node1

59 March 21, 2017


What is CLUVFY?
Introduced in Oracle 10.2

Supplied with Oracle Clusterware


Can be downloaded from OTN (Linux and Windows)

Written in Java - requires JRE (supplied)

Also works with 10.1 (specify -10gR1 option)

Checks cluster configuration


stages - verifies all steps for specified stage have been completed
components - verifies specified component has been correctly installed

60 March 21, 2017


CLUVFY
For example, to check configuration before installing Oracle Clusterware on node1
and node2 use:

sh runcluvfy.sh stage -pre crsinst -n node1,node2

Checks:
node reachability
user equivalence
administrative privileges
node connectivity
shared stored accessibility

If any checks fail append -verbose to display more information

63 March 21, 2017


CRS Installation

64 March 21, 2017


CRS Installation

65 March 21, 2017


CRS Installation

66 March 21, 2017


CRS Installation

67 March 21, 2017


CRS Installation

68 March 21, 2017


CRS Installation

69 March 21, 2017


CRS Installation

70 March 21, 2017


CRS Installation

71 March 21, 2017


CRS Installation

72 March 21, 2017


DBCA
Can be used to
Create RAC database and instances
Create ASM instance
Manage ASM instance (10.2)
Add RAC instances
Create RAC database templates
structure only
with data
Create clone RAC database (10.2)
Create, Manage and Drop Services
Drop instances and database

73 March 21, 2017


Other Utilities
Additional RAC utilities and diagnostics include
OCRCONFIG
OCRCHECK
OCRDUMP
CRSCTL
CRS_STAT

74 March 21, 2017

You might also like