Performance Tuning in Informatica

Performance Tuning: The Goal of the Performance tuning is to eliminating performance
bottlenecks.
First we should identify a performance bottleneck, eliminate it, and then identify
The next performance bottleneck until session performance increases.
We can use the test load option in sessions--> Properties.
The most common performance bottleneck occurs when the Power Center Server
Writes to a target database. You can identify performance bottlenecks by the following
methods:
1. Running test sessions: You can configure a test session to read from a flat file source
Or to write to a flat file target to identify source and target bottlenecks.
2. Studying performance details: You can create a set of information called
performance details to identify session bottlenecks. Performance details provide
information such as buffer input and Output efficiency.
3. Monitoring system performance: You can use system monitoring tools to view
percent CPU usage, I/O waits, and paging to identify system bottlenecks.
Once you determine the location of a performance bottleneck, you can eliminate
the bottleneck by following these guidelines:
1. Eliminate source and target database bottlenecks: Have the database administrator
optimize database Performance by optimizing the query, increasing the database network
packet size, or configuring index and key constraints.
2. Eliminate mapping bottlenecks: Fine tune the pipeline logic and transformation
settings and options in mappings to eliminate mapping bottlenecks.
3. Eliminate session bottlenecks: You can optimize the session strategy and use
performance details to help tune session configuration.
4. Eliminate system bottlenecks: Have the system administrator analyze information
from system monitoring tools and improve CPU and network performance.
The first step in performance tuning is to identify the performance bottleneck.
Performance bottlenecks can occur in the source and target databases, the mapping, the
session, and the system. Generally, you should look for performance bottlenecks in the
following order:
Target
Source
Mapping
Session
System
You can identify performance bottlenecks by running test sessions, viewing performance
details, and using system monitoring tools.
1. Identifying Target Bottlenecks: The most common performance bottleneck occurs
when the Power Center Server writes to a target database. You can identify target
bottlenecks by configuring the session to write to a flat file target. If the session
performance increases significantly when you write to a flat file, you have a target
bottleneck.
If your session already writes to a flat file target, you probably do not have a target
bottleneck. You can optimize session performance by writing to a flat file target local to
the Power Center Server.
Causes for a target bottleneck may include small check point intervals, small database
network packet size, or problems during heavy loading operations. The following task
can Increase the Performance in target bottleneck:
Drop or Disable indexes and constraints.
Use bulk loading.
Use external loading.
Increase database network packet size.
Increase commit intervals
Optimize Oracle target databases.
2. Identifying Source Bottlenecks: This Performance bottlenecks can occur when the
Power Center Server reads from a source database. If your session reads from a flat file
source, you probably do not have a source bottleneck. You can improve session
performance by setting the number of bytes the Power Center Server reads per line if you
read from a flat file source.
If the session reads from relational source, you can use a filter transformation, a
read test mapping, or a database query to identify source bottlenecks.
A. Using Filter Transformation: Add a filter transformation in the mapping after each
source qualifier. Set the filter condition to false so that no data is processed past the filter
transformation. If the time it takes to run the new session remains about the same, then
you have a source bottleneck.
B. Using Read Test Mapping: You can create a read test mapping to identify source
bottlenecks. A read test mapping isolates the read query by removing the transformation
in the mapping. Use the following steps to create a read test mapping:
Make a copy of the original mapping. In the copied mapping, keep only the sources,
source qualifiers, and any custom joins or queries. Remove all transformations. Connect
the source qualifiers to a file target.
C. Using Database Query: You can identify source bottlenecks by executing the read
query directly against the source database. Copy the read query directly from the session
log. Execute the query against the source database with a query tool. Measure the query
execution time and the time it takes for the query to return the first row. If there is a long
delay between the two time measurements, you can use an optimizer hint to eliminate the
source bottleneck.
Causes for a source bottleneck may include an inefficient query or small database
network packet sizes. The following task can Increase the Performance in source
bottleneck:
Use conditional filters.
Use Indexes wherever possible
Increase database network packet size.
Optimize the query.
Create tempdb as in-memory database.
Connect to Oracle databases using IPC protocol.
3. Identifying Mapping Bottlenecks: If you determine that you do not have a source or
target bottleneck, you might have a mapping bottleneck. You can identify mapping
bottlenecks by using a Filter transformation in the mapping.
If you determine that you do not have a source bottleneck, you can add a Filter
transformation in the mapping before each target definition. Set the filter condition to
false so that no data is loaded into the target tables. If the time it takes to run the new
session is the same as the original session, you have a mapping bottleneck.
You can also identify mapping bottlenecks by using performance details. High
errorrows and rowsinlookupcache counters indicate a mapping bottleneck. The following
task can Increase the Performance in mapping bottleneck:
Mapping Level optimization may take time to implement but can significantly
boost session performance. Generally, you reduce the number of transformations in the
mapping and delete unnecessary links between transformations to optimize the mapping.
You should configure the mapping with the least number of transformations and
expressions to do the most amount of work possible. You should minimize the amount of
data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and
Lookup transformations), limit connected input/output or output ports. Limiting the
number of connected input/output or output ports reduces the amount of data the
transformations store in the data cache. You can also perform the following tasks to
optimize the mapping:
Configure single-pass reading.
Optimize data type conversions.
Eliminate transformation errors.
Optimize transformations.
Optimize expressions.
4. Identifying Session Bottlenecks: If you do not have a source, target, or mapping
bottleneck, you may have a session bottleneck. You can identify a session bottleneck by
using the performance details. The Power Center Server creates performance details
when you enable Collect Performance Data in the Performance settings on the Properties
tab of the session properties.
Performance details display information about each Source Qualifier, target
definition, and individual transformation. All transformations have some basic counters
that indicate the number of input rows, output rows, and error rows. You can also perform
the following tasks to Increase the session performance:
Small cache size.
Run Concurrent Sessions
Partition the Session
Remove staging Area
Tune off session recovery
Reduce the Error Tracing
Low buffer memory.
Small commit intervals.
5 Identifying System Bottlenecks: After you tune the source, target, mapping, and
session, you may consider tuning the system. You can identify system bottlenecks by
using system tools to monitor CPU usage, memory usage, and paging.
On Windows, you can use system tools in the Task Manager or Administrative
Tools.
On UNIX systems you can use system tools such as vmstat and iostat to monitor
system performance.
How to improve Session Performance: The goal of performance tuning is optimize
session performance so sessions run during the available load window for the Informatica
Server. Increase the session performance by following.
1. Run Concurrent Sessions: Run Concurrent Sessions by using batches will also reduce
the time of loading the data. So concurrent batches may also increase the session
performance.
2. Partition the Session (Power Center): It improves the session performance by
creating multiple connections to sources and targets and loads data in parallel pipe lines.
3. Tune off Session Recovery
4. Reduce Error Tracing
5. Small Cache Size
6. Low buffer memory
7. Small commit Intervals
8. Tune Parameter: DTM buffer pool, Buffer Block size, Index cache size, Data cache
size, Commit Interval, Tracing Level. If the allocated data or index cache is not large
enough to store the data, the server stores the data in a temporary disk file as it processes
the session data. Each time the server pages to the disk the performance slows. This can
be seen from the counters. Since generally data cache is larger then the Index cache, it
has to be more than the Index.
9. Staging areas: If u uses staging areas u force informatica server to perform multiple
data passes. Removing of staging areas may improve session performance.
10. Run the informatica server in ASCII mode improves the session performance.
Because ASCII mode stores a character value in one byte. Unicode mode takes 2 bytes to
store a character.
11. Aggregator, Rank and joiner transformation may often decrease the session
performance. Because they must group data before processing it. To improve session
performance in this case enable sorted ports option
12. Flat files: If your flat files stored on a machine other than the informatica server,
move those files to the machine that consists of informatica server.
13. Relational data sources: Minimize the connections to sources, targets and
informatica server to improve session performance. Moving target database into server
system may improve session performance.
14. If a session joins multiple source tables in one Source Qualifier, optimizing the query
may improve performance. Also, single table select statements with an ORDER BY or
GROUP BY clause may benefit from optimization such as adding indexes.
15. We can improve the session performance by configuring the network packet size,
which allows data to cross the network at one time. To do this go to server manger,
choose server configure database connections. If u are target consists key constraints and
indexes u slow the loading of data. To improve the session performance in this case drop
constraints and indexes before u run the session and rebuild them after completion of
session.
Aggregator Performance:
1. Enable the Sorted Input option: Use sorted input to decrease the use of
aggregate caches. It reduces the amount of data cached during the session and
improves session performance. Use this option with the Sorter transformation to
pass sorted data to the Aggregator transformation.
2. Incremental Aggregator: It is a process of calculating the summaries for new
records by using agg cache. This improves the performance of session.
3. Limit the number of connected input/output ports: Limit the number of
connected input/output or output ports to reduce the amount of data the
Aggregator transformation stores in the data cache.
4. Group by on Simpler Columns: Group by simpler columns. Preferably Numeric
columns.
5. Filter before aggregating: If you use a Filter transformation in the mapping,
place the transformation before the Aggregator transformation to reduce
unnecessary aggregation.
6. Increase the Data and Index Cache Size: You can increase session performance
by increasing the index and data cache sizes in the transformation properties.
Filter Performance:
1. Use the Filter transformation early in the mapping: Use the Filter transformation
early in the mapping to reduce unnecessary Rows.
2. Use Source qualifier filter reduces the number of rows used throughout the
mapping: The Source Qualifier transformation provides an alternate way to filter rows.
Rather than filtering rows from within a mapping, the Source Qualifier transformation
filters rows when read from a source. The main difference is that the source qualifier
limits the row set extracted from a source, while the Filter transformation limits the row
set sent to a target. Since it runs in the database, you must make sure that the filter
condition in the Source Qualifier transformation only uses standard SQL. Source qualifier
reduces the number of rows used throughout the mapping, it provides better performance.
Joiner Performance:
1. Enable Sorted Input: It improves the Performance of data join. The port on which
Join condition is defined, the same port needs to be sorted. When you enable the Sorted
Input Option in Joiner, the Power Center Server improves session performance by
minimizing disk input and output. You see the greatest performance improvement when
you work with large data sets.
2. Partition the Pipeline: You can increase the number of partitions in a pipeline to
improve session performance. When you partition a session using a Joiner transformation
that requires sorted input, you must verify the Joiner transformation receives sorted data.
However, partitions that redistribute rows can rearrange the order of sorted data, so it is
important to configure partitions to maintain sorted data.
3. Select Join type as normal or master outer join performs faster than a full outer or
detail outer join.
4. Perform joins in a database when possible: Performing a join in a database is faster
than performing a join in the session. In some cases, this is not possible, such as joining
tables from two different databases or flat file systems. If you want to perform a join in a
database, you can use the following options:
A Create a pre-session stored procedure to join the tables in a database.
B. Use the Source Qualifier transformation to perform the join.
5. For an unsorted Joiner transformation, designate as the master source the source
with fewer Rows: For optimal performance and disk storage, designate the master
source as the source with the fewer rows. During a session, the Joiner transformation
compares each row of the master source against the detail source. The fewer unique rows
in the master, the fewer iterations of the join comparison occur, which speeds the join
process.
6. For a sorted Joiner transformation, designate as the master source the source
with fewer duplicate key values: For optimal performance and disk storage, designate
the master source as the source with fewer duplicate key values. When the Power Center
Server processes a sorted Joiner transformation, it caches rows for one hundred keys at a
time. If the master source contains many rows with the same key value, the Power Center
Server must cache more rows, and performance can be slowed.
7. Run the Power Center Server in ASCII mode: When you run the Power Center
Server in Unicode mode, it uses the selected session sort order to sort character data.
When you run the Power Center Server in ASCII mode, it sorts all character data using a
binary sort order. To ensure that data is sorted as the Power Center Server requires, the
database sort order must be the same as the user-defined session sort order. If you pass
unsorted or incorrectly sorted data to a Joiner transformation configured to use sorted
data, the session fails and the Power Center Server logs the error in the session log file.
Lookup Performance:
1. Enable the Lookup cache: Improve session performance by increasing lookup size.
2. Remove unwanted columns: Remove unwanted columns from the Lookup table.
3. Lookup on small tables: Improve session performance by lookup on small tables
4. Enable Sorted Input: By enabling the sorted input we can improve the performance
for Flat File lookups.
5. Adding Index to the lookup table: We can create an index for the lookup
table if we have permissions. We can improve the performance for both cached
and uncached lookups. This is important for very large lookup tables. Since the Power
Center Server needs to query, sort and compare values in these columns, the index needs
to include every column used in a lookup condition.
A. Cached lookups: You can improve performance by indexing the columns in the lookup
ORDER BY. The session log contains the ORDER BY statement.
B. Uncached lookups: Because the Power Center Server issues a SELECT statement for
each row passing into the Lookup transformation, you can improve performance by
indexing the columns in the lookup condition.
6. Create Unconnected Lookup: It does not involve directly in the data flow due to this
It can cal many number of times in the mapping, when we want to return only one port. It
supports only relational sources and Static Cache.
7. Use a persistent lookup cache for static lookups: This type of cache is used among
multiple sessions. If the lookup source does not change between sessions, configure the
Lookup transformation to use a persistent lookup cache. The Power Center Server then
saves and reuses cache files from session to session, eliminating the time required to read
the lookup source.
8. Use lookup override: It is a default SQL statement. If both sources are relational, then
use lookup override is use to join multiple sources. By default informatica server add the
order by clause. We can override the SQL lookup in the following circumstances to
increase performance:
(1) Override the ORDER BY statement: Override the ORDER BY statement with fewer
columns to Increase Performance. When you override the ORDER BY statement, you
must suppress the generated ORDER BY statement with a comment notation with two
dashes like _ _ Ex: SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE FROM
ITEMS_DIM ORDER BY ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE -(2) Add a WHERE statement: Use a lookup SQL override to add a WHERE statement to
the default SQL statement. You might want to use this to reduce the number of rows
included in the cache. Note: The session fails if you include large object ports in a
WHERE clause.
9. Use (=) Operator for several conditions: If a Lookup transformation specifies
several conditions, you can improve lookup performance by placing all the conditions
that use the equality operator (=) first in the list of conditions that appear under the
Condition tab.
10. Divide the lookup mapping into two pipelines:
A. Dedicate one for insert: source - target, these r new rows. Only the
new rows will come to mapping and the process will be fast.
B. Dedicate second for update: source=target, these r existing rows.
Only the rows which exists all ready will come into the mapping.
11. Cache files on the same Machine: Cache files should be on the
same Machine where informatica server is installed so that it reduces
the time.
12. Use shared cache or Reuse the lookup: If the same lookup
SQL is being used in some other lookup, then you have to go for shared
cache or Reuse the lookup.
13. Connect Native Database Driver: The Power Center Sever can connect
to a lookup table using a native database driver or an ODBC driver. Native database
drivers improve session performance.
14. Join Tables in the database instead of Lookup: If the lookup
table is on the same database as the source table in your mapping and
caching is not feasible, join the tables in the source database rather
than using a Lookup transformation.
Sequence Generator Performance:
1. Create Reusable Sequence Generator: Try Creating a

Reusable Sequence Generator Trans and use it in multiple
mappings. You might reuse a Sequence Generator when you
perform multiple loads to a single target.
2. The number of cached value property determines the number
of values the informatica server caches at one time. For nonreusable Sequence Generator transformations, Number of
Cached Values is set to zero by default, For reusable Sequence
Generator transformations, Number of Cached Values is set to
1000 by default.
Expression Performance:
1. Use Common Logic
2. Minimize aggregate function calls
3. Replace common sub-expressions with local variables.
4. Use operators instead of functions
What are the main parameters to increase the Informatica server performance:
Before doing tuning that is specific to Informatica:
1. Check hard disks on related machines. (Slow disk access on source and target
databases, source and target file systems, as well as the Informatica Server and repository
machines can slow session performance.)
2. Improve network speed. (Slow network connections can slow session performance.)
3. Check CPUs on related machines (make sure the Informatica Server and related
machines run on high performance CPUs.)
4. Configure physical memory for the Informatica Server to minimize disk I/O.
(Configure the physical memory for the Informatica Server machine to minimize paging
to disk.)
5. Optimize database configuration
6. Staging areas: If you use a staging area, you force the Informatica Server to perform
multiple passes on your data. Where possible, remove staging areas to improve
performance.
7. You can run multiple Informatica Servers on separate systems against the same
repository. Distributing the session load to separate Informatica Server systems increases
performance.
Informatica specific:
- Transformation tuning
- Using Caches
- Avoiding Lookups by using DECODE for smaller and frequently used tables
- Applying Filter at the earliest point in the data flow etc.
How the informatica server increases the session performance:
For relational sources, informatica server creates multiple connections
for each partition of a single source and extracts separate range of
data for each connection. Informatica server reads multiple partitions
of a single source concurrently. Similarly for loading also informatica
server creates multiple connections to the target and loads partitions

of data concurrently.
For XML and file sources, informatica server reads multiple files
concurrently. For loading the data informatica server creates a
separate file for each partition (of a source file). You can choose to
merge the targets.
There are 10 lookups in a mapping and one there is performance is slow, so how can
we find out which one: Check which lookup having maximum number of records and
the time taken to built the cache from session log file or if you have any lookup override
try to execute at backend and see how much time it is taken. Also you can start running
the map and keep refreshing the session log file and when it processing lookup it will be
waiting for the lookup to create.
I have 20 lookup, 10 joiners, 1 normalizer how you will improve the session
performance: We have to calculate lookup & joiner caches size.
Cache Formula:
For Aggregator Transformation:
Index Cache:
no. of groups [( column size ) + 17 ]
Data Cache :
no. of groups [( column size ) + 7 ]
For Lookup Transformation:
Index Cache:
no. of rows in lookup table [( column size ) + 16 ]
Data Cache :
no. of rows in lookup table [( column size ) + 8 ]
For Joiner Transformation:
Index Cache:
no. of master rows[( column size ) + 16 ]
Data Cache :
no. of master rows[( column size ) + 8 ]
NOTE: The Symbol is Sigma. It means the sum of all the column size.
Calculation for Data and Index caches are given in Informatica manual in
detail: Regarding changing your cache size if you have cache size less than what is
required, excess data will be paged to hard disc and will slow the session and hence
reduces the performance. And if your cache size bigger than required it will take more
space in your main memory and in case, memory is not available, itll fail the
session.
If we have 4 lookups, how u increase the performance:
1. First we decide cache is essential or not depends upon source and lookup rows.
2. By using Lookup override we can improve the session performance by joining
multiple lookups.

Performance Tuning in Informatica

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Tuning in Informatica

Uploaded by

Copyright:

Available Formats

Performance Tuning: The Goal of the Performance tuning is to eliminating performance

1. Create Reusable Sequence Generator: Try Creating a

server creates multiple connections to the target and loads partitions

You might also like