Insert Update Ordering in A Mappings

Insert / Update ordering in Informatica mappings
ETL-Performance.com
Stephen Barr
Does the order of inserts & updates to a target make a substantial difference to the overall performance
of the mapping, and perhaps more importantly to the overall scalability of the solution? The resounding
answer is YES!
Test case
My source and target tables exist in the same database but within different schemas. I’ve designed the
data such that 50% of the rows from the source will be updates and 50% will be inserts.
SOURCE@INFADB>select count(*)
2 from insert_update_source
3 /
COUNT(*)
----------
202992
Elapsed: 00:00:01.45
SOURCE@INFADB>select action, count(*)
2 from insert_update_source
3 group by action
4 /
ACTION COUNT(*)
------ ----------
UPDATE 101496
INSERT 101496
Elapsed: 00:00:00.48
Using these sources and targets I created two mappings.
Mapping 1 – interleaved inserts / updates
In this mapping, the target will receive an insert, update, insert, etc. This has been designed to signify a
worse case scenario.
Mapping 2 – inserts / updates routed to separate targets
In this mapping, there are two versions of the target. The inserts are routed to one target, while the
updates are routed to the other. We then use the “Target Load Plan” to choose which one we should
load first.
The scripts for creating the source and target tables are available at the bottom of this document.
Results
Overall run times –
Mapping 1 - 6 minutes 14 seconds

Mapping 2 - 2 minutes 25 seconds
As you can see there is a massive difference in the runtimes between the two mappings. Obviously,
something fundamental is happening in the first making which is making is perform so poorly – and
from looking at the oracle trace files we can see exactly what the issue is.
From the trace of the target we can see the overall statistics for the insert statement from the first
mapping –
INSERT INTO INSERT_UPDATE_TARGET(ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME,

OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS,
TEMPORARY,GENERATED,SECONDARY,ACTION)
VALUES
( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15)
call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 100922 26.60 27.14 46 1990 323326 101496
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 100923 26.60 27.14 46 1990 323326 101496
We can see that there were 100923 executions of the insert statement, resulting in 323326 current
block gets.
However, if we look at the second mapping –
INSERT INTO INSERT_UPDATE_TARGET(ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME,

OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS,
TEMPORARY,GENERATED,SECONDARY,ACTION)
VALUES
( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15)
call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 705 3.50 5.50 1 4005 28802 101496
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 706 3.50 5.50 1 4005 28802 101496
You can see that there were only 706 executions of the insert statement with only 28802 current block
gets. If we stack up the figures we can see this more starkly –
Map1 – insert Map 2 - insert Map1 – update Map 2 – update
Executions 100923 706 101497 101497

CPU time 26.6 3.5 37.39 30.68
Elapsed time 27.14 5.5 50.58 42.67
Block gets 322326 28802 110023 107386
As you can see, there is a huge difference in the inserts – especially when it comes to cpu time and the
number of block gets. The reason? Array inserts.
Informatica uses the native Oracle Call Interface to communicate with the oracle server. One of the
features of the OCI interface is it’s ability to allow an OCI client to perform array inserts / updates.
This means that for a single “execution” of the statement, multiple rows of data are processed. We can
see this is happening because of the rows / executions for our insert statement is > 1.
In fact, the average array size in this case looks to be ~170 rows of data. These array operations are
much more efficient that ordinary insert operations.
So why is one mapping performing array operations but the other is not? Informatica has implemented
its OCI interface in a very simple generic way. If an insert statement is receiving by the writer process
it will start to build an array. If another insert statement comes through, then this is simply added to the
existing array. When the array is full then Informatica will send that array to oracle for processing as a
single message. However, if we are in the middle of building an array of inserts and the writer receives
an update, then Informatica will send the insert array as it currently is, followed by the update.
Therefore, it we have interleaved inserts then updates, then we are effectively not using arrays at all.
We can see this from the raw trace files –
In mapping 1, we can see the inserts and updates are interleaved almost perfectly –
EXEC #1:c=0,e=287,p=0,cr=0,cu=3,mis=0,r=1,dep=0,og=1,tim=26087356343
WAIT #1: nam='SQL*Net message to client' ela= 6 driver id=1413697536 #bytes=1 p3=0
obj#=-1 tim=26087356647
WAIT #1: nam='SQL*Net message from client' ela= 873 driver id=1413697536 #bytes=1 p3=0
obj#=-1 tim=26087357719
obj#=-1 tim=26087358803
obj#=-1 tim=26087359884
obj#=-1 tim=26087361027
obj#=-1 tim=26087362116
obj#=-1 tim=26087363197
obj#=-1 tim=26087364264
EXEC #1 is our insert, EXEC #2 is our update.
However, looking at the trace file from mapping 2, we can see that the operations are grouped together
–
obj#=-1 tim=28092846876
obj#=-1 tim=28092847418
obj#=-1 tim=28092847887
obj#=-1 tim=28092848366
obj#=-1 tim=28092848771
obj#=-1 tim=28092849238
obj#=-1 tim=28092849638
obj#=-1 tim=28092850093
We can see the massive difference in the traffic generated between Informatica and oracle when
comparing both mappings –
Mapping 1 –
SQL*Net message to client 100922 0.02 0.74

SQL*Net message from client 100922 0.08 63.88
Mapping 2 –
SQL*Net message to client 705 0.00 0.00

SQL*Net message from client 705 0.08 5.02
This is a massive reduction in the time the mapping is spending communicating with the oracle
database – and if we start to scale these figures up to production volumes you can see that these sort of
issues need to be seriously considered.
Those of you with a good eye will have spotted something a bit strange. The figures for the update
statement are effectively the same for both mappings. It actually looks like Informatica does not
support array updates. If this is true then it seems like a glaring hole in its OCI implementation.
However, if you have evidence to the contrary let me know!
Implications
This was a very contrived test on a small single cpu box. I was using relatively small volumes and very
simple structures. However, the trace files from oracle reflect the magnitude of the difference between
the two approaches which will hold true even for the biggest of systems or the most complex of
mappings.
It’s very easy to detect whether or not you’re experiencing these issues and if you are then the
performance benefits you could glean from separating your inserts / updates from each other could be
fantastic – especially given how little effort is required to make a change like this.
Scipts to create source & target tables –
SOURCE
create sequence insert_update_seq;
create table insert_update_source

as
select insert_update_seq.nextval,
owner,
object_name,
subobject_name,
object_id,
data_object_id,
object_type,
created,
last_ddl_time,
timestamp,
status,
temporary,
generated,
secondary,
decode(mod(rownum,2),0,'INSERT','UPDATE') as action
from dba_objects
/
insert into insert_update_source

(
select insert_update_seq.nextval,
owner,
object_name,
subobject_name,
object_id,
data_object_id,
object_type,
created,
last_ddl_time,
timestamp,
status,
temporary,
generated,
secondary,
decode(mod(rownum,2),0,'INSERT','UPDATE') as action
from insert_update_source
)
/
/
/
commit;
exec dbms_stats.gather_table_stats(user, 'INSERT_UPDATE_SOURCE');
TARGET
create table insert_update_target

(
ID NUMBER,
OWNER VARCHAR2(30),
OBJECT_NAME VARCHAR2(128),
SUBOBJECT_NAME VARCHAR2(30),
OBJECT_ID NUMBER,
DATA_OBJECT_ID NUMBER,
OBJECT_TYPE VARCHAR2(19),
CREATED DATE,
LAST_DDL_TIME DATE,
TIMESTAMP VARCHAR2(19),
STATUS VARCHAR2(7),
TEMPORARY VARCHAR2(1),
GENERATED VARCHAR2(1),
SECONDARY VARCHAR2(1),
ACTION VARCHAR2(6)
)
/
insert into insert_update_target

(
select *
from source.insert_update_source
where action = 'UPDATE'
)
/
commit;
create index id_idx on insert_update_target(id);
exec
dbms_stats.gather_table_stats(user,'INSERT_UPDATE_TARGET',cascade=>TR
UE);

Insert Update Ordering in A Mappings

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Insert Update Ordering in A Mappings

Uploaded by

Copyright:

Available Formats

Insert / Update ordering in Informatica mappings

Using these sources and targets I created two mappings.

Mapping 1 – interleaved inserts / updates

Mapping 2 – inserts / updates routed to separate targets

Overall run times –

Mapping 1 - 6 minutes 14 seconds

INSERT INTO INSERT_UPDATE_TARGET(ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME,

call count cpu elapsed disk query current rows

However, if we look at the second mapping –

INSERT INTO INSERT_UPDATE_TARGET(ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME,

call count cpu elapsed disk query current rows

Map1 – insert Map 2 - insert Map1 – update Map 2 – update

Executions 100923 706 101497 101497

EXEC #1 is our insert, EXEC #2 is our update.

SQL*Net message to client 100922 0.02 0.74

SQL*Net message to client 705 0.00 0.00

create sequence insert_update_seq;

create table insert_update_source

insert into insert_update_source

exec dbms_stats.gather_table_stats(user, 'INSERT_UPDATE_SOURCE');

create table insert_update_target

insert into insert_update_target

create index id_idx on insert_update_target(id);

You might also like