You are on page 1of 58

InfoSphere CDC

Architecture and Functionality

2010 IBM Corporation


Information Management Software

CDC Architecture

2
Information Management Software

CDC Log Based Architecture

Source Java-based GUI Target


Unified Admin Point
With Monitoring
ODS

Audit
Database

BI Appliance
TCP/IP

Message
Queue

Journal Log Publisher Engine Subscriber Engine Info


Redo/Archive Logs And Metadata And Metadata Server

Scrape Push Apply Confirm

3
Information Management Software

CDC Detailed architecture

4
Information Management Software

Description of Shared and Common Components


Component Function Description
Access Server Monitoring and Controls all non-command line access to the replication
Configuration service environment. When you connect into the Management
Console you are connected to the Access Server

Admin API Java based API Optional Java-based programming interface that you can use
interface to CDC to script operational configurations
commands
Command Line Command based Allows you to administer datastore and user accounts, as well
Interface operations as perform administration scripting independent of MC

Communication Communications Acts as the dedicated network connection between the


Layer (TCP/IP) between source and source and target
target
Datastore CDC definition The source and target datastores represent the data files and
CDC instances required for data replication
Replication CDC instance Serves to send and receive data. CDC can operate as both a
Engine process source capture engine and a target engine simultaneously
Management Monitoring and The interactive application that you use to configure and
Console Configuration monitor replication

5
Information Management Software

Description of Source Components


Component Function Description

Mirror Changed data Performs the replication of changes to the target


table or accumulation of source table changes
used to replicate changes to the target table
Refresh Synchronize Performs initial synchronization of the tables from
source and target the source database to the target
tables
Single Scrape Log reading and Acts as a source-only log reader and a log parser
parsing component

Source Transformations Used to process row filtering, critical column


transformation filtering, encoding conversions, and other data to
engine propagate to the target datastore engine

Source Database log Maintained by source database for its own


database logs recovery purposes. CDC reads from these logs to
minimize impact on source database

6
Information Management Software

Description of Target Components


Component Function Description
Target Transformations Used to process data and value translations,
transformation encoding conversions, user exits, conflict
engine detections, and other data on the target
datastore engine
Apply agent Apply changes Acts as the agent on the target that processes
to target changes as sent by the source

7
Information Management Software

CDC Single Scrape Architecture

CDC Instance (source) CDC (target)

Target
Subscription 1
Subscription 1

Change log JDBC


(staging store) Target
Log reader Log parser Subscription 2
Backed by disk Subscription 2

Database
logs

Target
Subscription 3
Subscription 3

Transaction
queues

8
Information Management Software

Single (shared) Scrape

Shared scrape allows multiple subscriptions to use the same log


reader(s) and parser thread per instance which reads the source
database transaction log, capturing changes for all the tables
being mirrored by each subscription
The changed data is stored in the CDC staging store
When subscriptions are running, CDC keeps the staging store
data in memory whenever possible to allow for the highest
throughput, and uses disk storage when needed
During controlled shutdown the portion of the staging store in
memory is persisted to disk allowing for faster restart
Data is removed from the staging store after it has been
successfully applied on the target

9
Information Management Software

Cleaning the staging store best practice

Here are the steps to clear the staging store.


Cleaning TxQ files in the <CDC install dir>/instance/<instance
name>/conf .
Cleaning all files in the tmp directory. (TxQ files go to tmp if the
size is too large)
Cleaning the staging store using dmclearstagingstore command-
line utility

10
Information Management Software

Single Scrape Best Practices


All subscriptions should be running simultaneously to prevent any
idle subscriptions from holding back the single scrape position and
accumulating memory usage for staging purposes
If a subscription is not run for a long time consider setting the table
to Parked status and set replication status to Refresh
Avoid having long running open transactions. Use the following
query to check:
SELECT t.start_scnw,t.start_scnb,t.start_time,s.username,
o.object_name, o.owner FROM gv$transaction t, gv$session s,
gv$locked_object l, dba_objects o
WHERE
l.object_id IN (<INSCOPE TABLES OBJECT ID FROM DBA_OBJECTS>) AND
t.ses_addr = s.saddr AND t.xidusn = l.xidusn AND t.xidslot =
l.xidslot AND t.xidsqn = l.xidsqn AND l.object_id =
o.object_id

11
Information Management Software

Heartbeat
Heartbeat message is sent on the control channel
System Parameters:
Timeout value if heartbeat not received from other side:
global_shutdown_after_no_heartbeat_response_minutes=10 mins
Frequency of heartbeat message sent: global_heartbeat_interval_seconds = 15 secs
The data incoming connection is usually idle. Firewall can potentially close this idle connection after it exceeds the idle
time period. To address this issue configure a keep alive msg on the data channel to keep the connection alive
TCP_KEEPALIVE_SECS configured in comms.ini

Heartbeat sent every 15 sec

Heartbeat confirmation Target


Source Subscription
Subscription

Heartbeat sent on the communications every TCP_KEEPALIVE_SECS to prevent firewall from closing sessions

Control Channel

Data Channel

12
Information Management Software

Remote configuration with Oracle source


1. Scraper on source, Apply on target 2. Scraper and Apply on target
Source Server Target Server Source Server Target Server
Metadata Metadata
Metadata Metadata
Source Tables Scraper Apply Scraper
Target Tables Source Tables
Target Tables
redolog redolog Apply

3. Scraper and Apply on source 4. Scraper and Apply on CDC server


Source Server Target Server Source Server CDC Server Target Server
Metadata Metadata
Metadata Scraper Metadata
Source Tables Scraper Apply
Target Tables Source Tables
Target Tables
redolog redolog Apply

13
Information Management Software

Checkpoint:

Q1: If the transaction queue and/or staging store size grows to a


large size what are some possible causes or areas we can check?

Q2: When are transaction queues and staging store data persisted
to disk?

Q3: Describe the recoverability of ICDC and how the product can
recover and continue replication without data loss in the event of
database/network outages.

Q4: What are possible causes of a Heartbeat timeout error on the


source?

14
Information Management Software

Deploying CDC
Features and Configuration Details

15
Information Management Software

Step 1: Define new Datastore


To define a new datastore (connect to a replication system),
you need:
Name your
Unique datastore name datastores
Description well!

Hostname
Port - communication connection
Communication protocol (TCP/IP)
Optionally, you may need to know:
Datastore type Datastore

Source, Target, dual


Replication Engine
Platform & database type DBMS

CDC version
16
Information Management Software

Add data store


Add a data store
Register CDC instance to be managed in Access Server.
Data store name
It generally makes it easier to set:
Data store name = Instance name = DB name.

Server name creating CDC instance


(NB) It is necessary to enable name resolution
with the same IP and host name from the
opposite replication server as well as the
access server.
CDC instance port number

Click Ping button to connect CDC instance,


and property information will be displayed.
Specify a default user for
connection to DB.

17
Information Management Software

Step 2a: Define new user

Role
System
administrator
Able to newly create user or data Administrator
store Operator
Monitor

18
Information Management Software

Step 2b: Set user options

Enabled/disable accounts
Lock/unlock accounts
Force password change
Set password to never expire

19
Information Management Software

CDC user role

CDC user can limit operations with 4 types of roles

Roles
System System Administrator Operator Monitor
administrator + administrator
Enable user
account and
data store
management
Access Manager
Available
perspective Configuration
Monitor
Create user or
data store
mapping
Change system
parameters
Available Setup subscription
operations
Start and end
replication
Reference event
log or statistics

20
Information Management Software

Step 3a: Link datastores to users


Assign CDC user to data store
Map this CDC user and DB user Create mapping of CDC user connected to Access Server
and DB user connected to database.

21
Information Management Software

Step 3b: Enter database parameters

Shields database, login names & passwords from users.


iSeries Username, Password with access
to MetaData
Oracle/Sybase Oracle/Sybase SID, MetaData
Owner, Password
OS/390 Username, Password with access
to MetaData
SQL Server MetaData Database, Login,
Password
DB2 LUW, JDBC URL, MetaData Owner,
Teradata Password

22
Information Management Software

Checkpoint:

Q5: What levels of supplemental logging are required?

Q6: When does CDC enable and disable supplemental logging for
the table selected for mirroring?

Q7: What Management Console operations can be performed by a


Replication agent user with Operator privileges?

23
Information Management Software

Understanding CDC Concepts

24
Information Management Software

Defining a CDC Subscription


Table mapping
Combination of a source table and target table. Example: EMP table TGEMP table
Mapping is available only for tables.
Filtering and conversion are available in CDC.
Subscription
A subscription is a connection that is required to replicate data between a source
datastore and a target datastore. It contains details of the data that is being replicated
and how the source data is applied to the target.
Enable parallel processing by using multiple subscriptions

Process multiple table mapping with one subscription Process one subscription and table mapping
Subscription Subscription
Table mapping Table mapping
Table mapping
A table B table
D table
 TA table  TB table
 TD table

Source engine CDC instance Source engine

Target engine Target engine

25
Information Management Software

Mechanism to maintain referential integrity

Subscription

EMP table TGEMP table

EMPN DEPTN EMPN DEPTN


NAME NAME
O O O O

7500 MIKE 10 7500 MIKE 10

7499 ALLEN 10 7499 ALLEN 10

7369 SMITH 20 7369 SMITH 20


Apply
DEPT table TGDEPT table
Transaction
DEPTN message DEPTN
DNAME DNAME
O O
DEPT INSERT
10 ACCOUNTING
EMP INSERT 10 ACCOUNTING

Reflect updated values per 20 RESEARCH


20 RESEARCH
transaction unit identical to
original transaction
Source server Target server

Defining table mapping for integrity-related tables to the same subscription will enable changed
applications to be updated in transaction units identical to the original transaction, thus maintaining
referential integrity. This mechanism will also maintain data consistency between target tables.

26
Information Management Software

Guaranteed data consistency


Data transactions are applied at the target in the same order as it was
generated at the source
The Target will periodically send an acknowledgement to ensure delivery
Record the log position per operation in CDC metadata

Example: Defining T1 and T2 tables in the same subscription


Transactions processed in Transactions reflected in
the source DB the target DB

TX1

INSERT T1 (A) INSERT T1 (A)


INSERT T2 (B) INSERT T2 (B) TX1 processed
COMMIT Record log position
COMMIT
INSERT T2 (A)
TX2
INSERT T1 (B)
Record log position TX2 processed
INSERT T2 (A)
INSERT T1 (B) COMMIT
COMMIT

27
Information Management Software

Advantages and constraints of multiple subscriptions


Multiple subscriptions can be processed in parallel.
It will be able to improve throughput.
Constraints
Consistency and order of transactions are guaranteed only in subscriptions.

Example: Separating T1 and T2 subscriptions


Transaction processed in Transaction reflected in
the source DB the target DB

INSERT T2 (B)
INSERT T1 (A)
COMMIT
INSERT T2 (B)
COMMIT INSERT T1 (A) Transactions are
separated and their
COMMIT orders may be
switched.
One table is the minimum definition unit for subscription.
Resources are consumed in subscription units.
Read logs per subscription.

28
Information Management Software

Example of multiple subscription actions


Single subscription
Source Target
Table1 Table1
Threads Threads
Table2 Table2
subA1
Table3 Table3

Table4 Table4
Multi subscription
Source Target
Table1 Threads subB1 Threads Table1

Table2 subB2 Table2


Threads Threads

Table3 Table3
subB3
Table4 Threads Threads Table4

29
Information Management Software

Table Mapping/Target Apply Options


1. Standard (one to one)
Source and target tables have similar table
structures
2. Audit Apply
Generates audit trail of data transactions from
source
3. Adaptive Apply
Automatically synchronizes data for dissimilar
sources and targets
4. Summarization
Keeps a running total of numerical values at the
target (accumulation or deduction)
5. Consolidation:
One-to-One
Merges data from several tables into a single row
One-to-Many
Used to apply a source lookup table change to all
affected target rows

30
Information Management Software

LiveAudit
Product
ID Action Qty
Widget Make 1000
Widget Calibrate Test Eqpmt
Eqmt -
Widget Test Initiated 1000
Widget Test Result: PASS
FAIL 1000
Widget Bottle 1000
Widget Ship 1000

Date / Actn User Product Mfg Action Qty


Time ID
05/31/01-0800 I jwalker Widget Make 1000
05/31/01-1300 I jwalker Widget Calibrate Test Eqmt -
05/31/01-1500 I jwalker Widget Test Initiated 1000
06/01/01-0800 U jwalker Widget Test Result: Foreign 1000
Particulates Found
06/01/01-0900 U rtucker Widget Calibrate Test Eqpmt -
06/01/01-1100 U rtucker Widget Test Initiated
06/02/01-0800 U rtucker Widget Test Result: Pass 1000
06/01/01-1600 I jwalker Widget Bottle 1000
06/05/01-0800 I jwalker Widget Ship 1000

31
Information Management Software

Complex Updates: Summarization

CDC will provide the ability to natively summarize numeric


data into selected columns on a target table.
Summarize using Accumulation
Summarize using Deduction
Key Aisle Qty Price Aisle Qty Value
1 1 12 $ 1.25 Summarization key 1 19 $ 37.10
2 1 5 $ 0.30 is Aisle 2 10 $ 46.00
3 4 15 $12.00 4 55 $548.00
4 1 2 $10.30
5 4 40 $ 9.20
Value =
6 2 10 $ 4.60 Qty = Accumulated Accumulated
total of all Qty (Qty * Price)
summarized per summarized
Aisle per Aisle
32
Information Management Software

Complex Updates: Adaptive Apply

CDC can handle the following DML operations through


adaptive apply.
On source On target On target
(Row Exists) (Row Does Not Exist)

Insert Update Insert


Update Update Insert
Delete Deletes Ignore

Uses of Adaptive Apply


Publication table Data is not consistent
Restore Data from replication log

33
Information Management Software

Complex Updates: Row Consolidation One-to-One

CDC will provide the ability to merge different tables rows


into one target table rows
Customer Table
Key Fname Lname AddKey
1 John Grant 1
2 Julie Grant-Smythe 1
3 Vikram Ashby 2

Address Table
Key Address City Province PostalCode
1 5 Main St. E. Kingston ON K7L 8T3
2 14 Pineridge Ave. Kelowna BC V3R 8Z9

34
Information Management Software

Complex Updates: Row Consolidation One-to-One

CDC will provide the ability to merge different tables rows


into one target table rows.
Customer Table Address Table
Key Name AKey Key Address City Prov PostCode
1 John Grant 1 1 5 Main St. E. Kingston ON K7L 8T3
2 Julie Noor 1 2 14 Pineridge Ave. Kelowna BC V3R 8Z9

Consolidation Key

Merged Table
Key Name Address City Prov PostCode
1 John Grant 5 Main St. E. Kingston ON K7L 8T3
2 Julie Noor 14 Pineridge Ave. Kelowna BC V3R 8Z9

35
Information Management Software

Replication Modes: Refresh

Replication Log

Continuous
(Real Time)

Push Net Change


Refresh
Engine (Periodic)

Database Table

Refresh
(Full Copy)

36
Information Management Software

Replication Modes: Net Change

Replication Log

Continuous
(Real Time)

Push Net Change


Scrape
Engine (Periodic)

Database Table

Refresh
(Full Copy)

37
Information Management Software

Replication Modes: Continuous Mirroring

Replication Log

Continuous
(Real Time)

Push Net Change


Scrape
Engine (Periodic)

Database Table

Refresh
(Full Copy)

38
Information Management Software

Status of table mapping

Status
Refresh
Copy all in the next replication.
Initial status is set as refresh even if the replication
method is mirroring.
Active
Delta copy in the next replication.
Parked
This mapping will be ignored in the next replication.

39
Information Management Software

Mapping of columns
Define mapping of columns between source and target.
Able to define conversion processes.
Columns will be mapped to default values if there is no mapping.

Able to define Able to set mapping with Automatic mapping for


transformations. drag and drop. same column names.

Ability to select journal Input default values if


Specify key columns of
log data (journal control there is no mapping.
the target.
fields)

40
Information Management Software

Source Mappings Derived Columns

A column name,
description, type and
length is provided
The expression is
calculated on the source
The result is replicated to
the target

41
Information Management Software

Source Mappings Derived Columns Evaluation Frequency

After-Image Only: For an


update, the derived
column is only calculated
for the after image
Before and After Images:
For an update, the derived
column is calculated for
both the before and after
image
Essential if the source-
derived column is used as
a key on the target

42
Information Management Software

Target Mappings Derived Expressions

Check to add column data Used for non-standard


type details to source and mappings, e.g. by ordinal
target columns position

43
Information Management Software

Target Mappings Derived Expressions

Build expressions using a


combination of
Built-in functions
Operators
Source columns
Target columns
Journal Control Columns
Previously saved
expressions

44
Information Management Software

Derived Expression Functions


%BEFORE Net change (before image)
There are a number %CURR Current image
%CONCAT Concatenation
of functions built- %REPLACE Character substitution
in. %SUBSTRING Substring
%LOWER Lower case character conversion
%UPPER Upper case character conversion
%PROPER Proper case character conversion
%LTRIM Left Trim blank characters
%RTRIM Right Trim blank characters
%TOCHAR Convert to character
%TONUMBER Convert to number
%TODATE Convert date format
%TOTIME Convert time format
%CENTURY Add a 2 digit century to your date
%IF Conditional
%VAR Initialise a result variable
%USER Call user exit program
%GETCOL Get a column from another table
%STRPRC Call user exit stored procedure

45
Information Management Software

Journal Control Fields


JOURNAL CONTROLS Information about the
source row is
&CCID An identifier for the transaction.
replicated to the target
&CNTRRN source table relative record number
&CODE 'U' for refresh, 'R' for mirror. system
&ENTTYP Indicates the type of update.
&JOB source job making the update. This information can be
&JOBNO The OS user Id of the update process.
&JOBUSER The OS user at the time of the update. pulled into the target
&JOURNAL Name of the replication log. table.
&JRNFLG Indicates if before image is present
&JRNLIB The name of the replication log schema.
&LIBRARY The source table schema or its alias.
&MEMBER The source member name or its alias.
&PROGRAM Name of the updating program.
&OBJECT The source table name or its alias.
&SEQNO Replication log sequence number.
&SYSTEM The hostname of the source system
&TIMSTAMP Time of the update or refresh.
&USER The user ID which made the update.

46
Information Management Software

Filtering of rows and columns


Filtering of rows
Input WHERE conditions
Filtering of columns
By default, all columns are selected for replication
Deselect the columns that are not used for replication to reduce traffic volume
Critical Column Filtering
By default, all selected columns are flagged as critical
If a column is flagged as critical, updates to the target only if data in the critical
column changes

47
Information Management Software

Conflict Resolution

Conflict detection and resolution allows you to detect, log, and


act on inconsistent data.

This ensures your replication environment handles data conflicts


automatically and in accordance with your business rules.

This feature allows you to choose which columns the replication


process detects conflicts in and also how to automatically
resolve the conflicts when they are detected.

As conflicts are detected and resolved, CDC logs them in a


conflict resolution audit table (TS_CONFAUD)

48
Information Management Software

Conflict Detection
CDC detects the following conflicts as it replicates data from the
source to the target:
Inserting a row where the row's key already exists in the
subscription table. This violates the unique key constraint.

Updating a row where the row's key does not exist in the
subscription table.

Updating a row where the contents of the rows in the publication


table and subscription table, before the update, do not match.

Deleting a row where the row's key does not exist in the
subscription table.

Deleting a row where the contents of the rows in the publication


table and subscription table, before the delete, do not match.
49
Information Management Software

Conflict Resolution: Unsupported Types

CDC will not detect conflicts in columns:


Mapped to derived expressions using the %BEFORE,
%CURR, %GETCOL, %STPROC, and %USER functions.
Mapped to journal control fields.
Initialized during insert to any value.
Created with the following data types:
(SQL Server only) IMAGE, NTEXT, and TEXT.
(AS/400 only) BLOB, CLOB, and DBCLOB.
(Oracle only) BLOB, CLOB, LONG, LONG RAW, and NCLOB.

50
Information Management Software

Conflict Resolution Methods

The following methods are available to CDC to use when


conflicts occur:
None Reports the conflict as an error to the event log.
Source wins Applies the row from the publication table to the
subscription table.
Target wins Does not apply any changes to the subscription
table.
Largest value wins Applies the row with the larger value in its
comparison column to the subscription table.
Smallest value wins Applies the row with the smaller value in its
comparison column to the subscription table.
User exit Passes the conflicting rows to a user exit program and
applies the image returned by the user exit program to the
subscription table.
51
Information Management Software

Value Comparison Column

This field is only


available if you use the
Largest value wins or
Smallest value wins
resolution methods.

Select the column the


conflict resolution
method will use when
comparing values.

You must select a


column you are
detecting conflicts on.

52
Information Management Software

Operation
Control of application operations and refresh operations
 Settings such as disable Delete operations are possible.
 Able to change option items depending on table mapping types.
Row-level operations
Set whether or not each operation is available.
Table-level operations
Alter the default operation for Insert, Update and Delete
Delete all (default)
Options are Do not delete and Delete selected rows

53
Information Management Software

User Exit Support


2 3
DERIVED DERIVED
COLUMNS COLUMNS

1 4
ROW ROW
SELECTION LEVEL

Publisher Tables Subscriber Tables

54
Information Management Software

Row Level User Exits


Feature
Able to execute external programs before or after operation, refresh or commit.
Able to determine or manipulate conditions using values before or after updated or values in journal
management columns.
Able to support stored procedures and Java programs.
Consistency can be maintained in transactions when targets are updated by processing in the same
UOW (unit of work) as transactions.
Example of application
Make conversions that cannot be made with standard functions.
Write values in tables other than target tables.
Reference external databases or files.
Output information to external databases or files.
Write values in DBs other than standard supported databases.

Best Practice: Only use User


Exits when absolutely
necessary. UEs are
owned and maintained by
the customer.

55
Information Management Software

(Reference) Stored procedure: User Exit example


create or replace procedure
SP_2_1_2 ( These are the only two mandatory values.
result OUT INT,  Return value: 0 if successful. It is possible to select only the required
returnMsg OUT varchar2,  Return value: event is output if failed. values from the remaining values.
s$entry IN NUMBER,  Indicates the operation.
d$PK IN OUT INT,  Indicates the post-conversion value in CDC. If it is changed, the value can be
reflected in subsequent DML, because it is also an OUT parameter
a$ID IN INT,  AFTER value
b$ID IN INT,  BEFORE value
a$C1 IN CHAR,
a$C2 IN CHAR,
a$C3 IN CHAR,
a$C4 IN CHAR)
IS

BEGIN
CASE s$entry  AFTER INSERT, UPDATE, DELETE, which will be inserted in another table
WHEN 4 THEN insert into TGT_PATTERN2_1_2 values (d$PK, a$ID, a$C3, a$C4);
WHEN 6 THEN update TGT_PATTERN2_1_2 set pk=d$pk, C3=a$C3, C4=a$C4 where id=b$id;
WHEN 8 THEN delete from TGT_PATTERN2_1_2 where id=b$id;
END CASE;

result := 0;  Indicates successful processing


returnMsg := 'OK';

END SP_2_1_2;
/

56
Information Management Software

Checkpoint:

Q8: Under what conditions does it make sense to add an


additional subscription?

Q9: What is the difference between a derived column and a


derived expression? Give an example of when you would use
each option.
Match the description of the following apply methods:
a) Live Audit A) The key can contain duplicate values and if duplicates
exist, a publication row-level operation can result in more
than one row in the subscription table being changed.
Q10: b) Adaptive Apply B) Simplify accounting, statistical, and other business
operations that require intensive addition and subtraction
of numeric data. Applications designed to generate more
complex calculations for reports and other purposes can
work directly with the currently maintained totals.
c) Summarization C) Use in environments where data in the publication
table is not consistent with the data in the assigned
subscription table. Inconsistency may be a result of
external applications modifying these tables
independently.
d) Consolidation 1 D) Track changes made to the source table. Use
-1 additional target columns to hold journal information.
e) Consolidation 1- E) The unique identification numbers in the key column
many ensures that a publication row-level operation will affect
only one row in the subscription table.

57
Information Management Software

QUESTIONS?

58

You might also like