Professional Documents
Culture Documents
You can configure the Lookup transformation to perform different types of lookups. You can
configure the transformation to be connected or unconnected, cached or uncached:
Connected or unconnected.
Connected and unconnected transformations receive input and send output in different ways.
Cached or uncached.
Sometimes you can improve session performance by caching the lookup table. If you cache the
lookup table, you can choose to use a dynamic or static cache. By default, the lookup cache
remains static and does not change during the session. With a dynamic cache, the Informatica
Server inserts or updates rows in the cache during the session. When you cache the target table
as the lookup, you can look up values in the target and insert them if they do not exist, or update
them if they do
PERSISTANT CACHE-
If you want to save and reuse the cache files, you can configure the transformation to use a
persistent cache. Use a persistent cache when you know the lookup table does not change
between session runs.
The first time the Informatica Server runs a session using a persistent lookup cache, it saves the
cache files to disk instead of deleting them. The next time the Informatica Server runs the
session, it builds the memory cache from the cache files. If the lookup table changes
occasionally, you can override session properties to recache the lookup from the database.
NONPERSISTANT CACHE-
By default, the Informatica Server uses a non-persistent cache when you enable caching in a
Lookup transformation. The Informatica Server deletes the cache files at the end of a session.
The next time you run the session, the Informatica Server builds the memory cache from the
database
You might want to configure the transformation to use a dynamic cache when the target table is
also the lookup table. When you use a dynamic cache, the Informatica Server updates the lookup
cache as it passes rows to the target.
The Informatica Server builds the cache when it processes the first lookup request. It queries the
cache based on the lookup condition for each row that passes into the transformation.
When the Informatica Server reads a row from the source, it updates the lookup cache by
performing one of the following actions:
You can use the Source Qualifier to perform the following tasks:
Join data originating from the same source database. You can join two or more tables with
primary-foreign key relationships by linking the sources to one Source Qualifier.
Filter records when the Informatica Server reads source data. If you include a filter condition, the
Informatica Server adds a WHERE clause to the default query.
Specify an outer join rather than the default inner join. If you include a user-defined join, the
Informatica Server replaces the join information specified by the metadata in the SQL query.
Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds an
ORDER BY clause to the default SQL query.
Select only distinct values from the source. If you choose Select Distinct, the Informatica Server
adds a SELECT DISTINCT statement to the default SQL query.
Create a custom query to issue a special SELECT statement for the Informatica Server to read
source data. For example, you might use a custom query to perform aggregate calculations or
execute a stored procedure.
When the workflow reaches a session, the Load Manager starts the DTM process. The DTM
process is the process associated with the session task. The Load Manager creates one DTM
process for each session in the workflow. The DTM process performs the following tasks:
The DTM allocates process memory for the session and divides it into buffers. This is also known
as buffer memory. The default memory allocation is 12,000,000 bytes. The DTM uses multiple
threads to process data. The main DTM thread is called the master thread.
The master thread creates and manages other threads. The master thread for a session can
create mapping, pre-session, post-session, reader, transformation, and writer threads.
Mapping Thread -One thread for each session. Fetches session and mapping information.
Compiles the mapping. Cleans up after session execution.
Pre- and Post-Session Threads- One thread each to perform pre- and post-session operations.
Reader Thread -One thread for each partition for each source pipeline. Reads from sources.
Relational sources use relational reader threads, and file sources use file reader threads .
Transformation Thread -One or more transformation threads for each partition. Processes data
according to the transformation logic in the mapping.
Writer Thread- One thread for each partition, if a target exists in the source pipeline. Writes to
targets. Relational targets use relational writer threads, and file targets use file writer threads.
Ans: .If you use a flat file as a target, you can configure the Informatica Server to create an
indicator file for target row type information. For each target row, the indicator file contains a
number to indicate whether the row was marked for insert, update, delete, or reject. The
Informatica Server names this file target_name.ind and stores it in the same directory as the
target file.
to configure it - go to INFORMATICA SERVER SETUP-CONFUGRATION TAB-CLICK ON
INDICATOR FILE SETTINGS.
Suppose session is configured with commit interval of 10,000 rows and source has 50,000 rows
explain the commit points for Source-based commit & Target-based commit. Assume appropriate
value wherever required.
a)For example, a session is configured with target-based commit interval of 10,000. The writer
buffers fill every 7,500 rows. When the Informatica Server reaches the commit interval of 10,000,
it continues processing data until the writer buffer is filled. The second buffer fills at 15,000 rows,
and the Informatica Server issues a commit to the target. If the session completes successfully,
the Informatica Server issues commits after 15,000, 22,500, 30,000, and 40,000 rows.
b)The Informatica Server might commit less rows to the target than the number of rows produced
by the active source. For example, you have a source-based commit session that passes 10,000
rows through an active source, and 3,000 rows are dropped due to transformation logic. The
Informatica Server issues a commit to the target when the 7,000 remaining rows reach the
target.
The number of rows held in the writer buffers does not affect the commit point for a source-based
commit session. For example, you have a source-based commit session that passes 10,000 rows
through an active source. When those 10,000 rows reach the targets, the Informatica Server
issues a commit. If the session completes successfully, the Informatica Server issues commits
after 10,000, 20,000, 30,000, and 40,000 source rows.
Enable monitoring
Increase Load Manager shared memory
Understand performance counters .
b)
BufferInput_efficiency -Percentage reflecting how seldom the reader waited for a free buffer when
passing data to the DTM.
BufferOutput_efficiency - Percentage reflecting how seldom the DTM waited for a full buffer of
data from the reader.
Target
BufferInput_efficiency -Percentage reflecting how seldom the DTM waited for a free buffer when
passing data to the writer.
BufferOutput_efficiency -Percentage reflecting how seldom the Informatica Server waited for a full
buffer of data from the writer.
For Source Qualifiers and targets, a high value is considered 80-100 percent. Low is considered
0-20 percent. However, any dramatic difference in a given set of BufferInput_efficiency and
BufferOutput_efficiency counters indicates inefficiencies that may benefit from tuning.
informatica:
Ans: Load manager is the primary Informatica server process. It performs the following tasks:
a. Manages sessions and batch scheduling.
b. Locks the sessions and reads properties.
c. Reads parameter files.
d. Expands the server and session variables and parameters.
e. Verifies permissions and privileges.
f. Validates sources and targets code pages.
g. Creates session log files.
h. Creates Data Transformation Manager (DTM) process, which executes the session.
When you run a session, the Informatica Server writes a message in the session log indicating
the cache file name and the transformation name. When a session completes, the Informatica
Server typically deletes index and data cache files. However, you may find index and data files in
the cache directory under the following circumstances:
Table 21-2. Cache File Naming Convention Transformation Type Index File Name Data File
Name
Aggregator PMAGG*.idx PMAGG*.dat
Rank PMAGG*.idx PMAGG*.dat
Joiner PMJNR*.idx PMJNR*.dat
Lookup PMLKP*.idx PMLKP*.dat
If a cache file handles more than 2 GB of data, the Informatica Server creates multiple index and
data files. When creating these files, the Informatica Server appends a number to the end of the
filename, such as PMAGG*.idx1 and PMAGG*.idx2. The number of index and data files are
limited only by the amount of disk space available in the cache directory.
.Using the Normalizer transformation, you break out repeated data within a record into separate
records. For each new record it creates, the Normalizer transformation generates a unique
identifier. You can use this key value to join the normalized records.
Also possible in source analyzer-
source analyzer- table1(pk table)-edit-ports-keytype-select primarykey-.
table2(fktable) -edit-ports-keytype-select foreign key -select table name &column name from
options situated below.
If the source changes only incrementally and you can capture changes, you can configure the
session to process only those changes. This allows the Informatica Server to update your target
incrementally, rather than forcing it to process the entire source and recalculate the same
calculations each time you run the session. Therefore, only use incremental aggregation if:
Your mapping includes an aggregate function.
The source changes only incrementally.
You can capture incremental changes. You might do this by filtering source data by timestamp.
Before implementing incremental aggregation, consider the following issues:
Whether it is appropriate for the session
What to do before enabling incremental aggregation
When to reinitialize the aggregate caches
Scenario :-Informatica Server and Client are in different machines. You run a session from the
server manager by specifying the source and target databases. It displays an error. You are
confident that everything is correct. Then why it is displaying the error?
The connect strings for source and target databases are not configured on the Workstation
conatining the server though they may be on the client m/c.
U can improve performace by creating a concurrent batch to run several sessions in parallel on
one informatic server, if u have several independent sessions using separate sources and
separate mapping to populate diff targets u can place them in a concurrent batch and run them at
the same time , if u have a complex mapping with multiple sources u can separate the mapping
into several simpler mappings with separate sources. Similarly if u have session performing a
minimal no of transformations on large amounts of data like moving flat files to staging area, u
can separate the session into multiple sessions and run them concurrently in a batch cutting the
total run time dramatically
Ans. After the load manager performs validations for the session, it creates the DTM process.
The DTM process is the second process associated with the session run. The primary purpose of
the DTM process is to create and manage threads that carry out the session tasks.
The DTM allocates process memory for the session and divide it into buffers. This is also known
as buffer memory. It creates the main thread, which is called the master thread. The master
thread creates and manages all other threads.
If we partition a session, the DTM creates a set of threads for each partition to allow concurrent
processing.. When Informatica server writes messages to the session log it includes thread type
and thread ID. Following are the types of threads that DTM creates:
• MASTER THREAD - Main thread of the DTM process. Creates and manages all other threads.
• MAPPING THREAD - One Thread to Each Session. Fetches Session and Mapping Information.
• Pre And Post Session Thread - One Thread Each To Perform Pre And Post Session
Operations.
• READER THREAD - One Thread for Each Partition for Each Source Pipeline.
• WRITER THREAD - One Thread for Each Partition If Target Exist In The Source pipeline Write
To The Target.
• TRANSFORMATION THREAD - One or More Transformation Thread For Each Partition.
informatica : How is the Sequence Generator transformation different from other transformations?
Ans: The Sequence Generator is unique among all transformations because we cannot add, edit,
or delete its default ports (NEXTVAL and CURRVAL).
Unlike other transformations we cannot override the Sequence Generator transformation
properties at the session level. This protecxts the integrity of the sequence values generated.
Ans: We can make a Sequence Generator reusable, and use it in multiple mappings. We might
reuse a Sequence Generator when we perform multiple loads to a single target.
For example, if we have a large input file that we separate into three sessions running in parallel,
we can use a Sequence Generator to generate primary key values. If we use different Sequence
Generators, the Informatica Server might accidentally generate duplicate key values. Instead, we
can use the same reusable Se
Ans:
Ans: We can configure a connected Lookup transformation to receive input directly from
the mapping pipeline, or we can configure an unconnected Lookup transformation to
receive input from the result of an expression in another transformation.
An unconnected Lookup transformation exists separate from the pipeline in the mapping.
We write an expression using the :LKP reference qualifier to call the lookup within another
transformation.
Ans:
Ans:
We use a Lookup transformation in our mapping to look up data in a relational table, view or
synonym.
Get a related value. For example, if our source table includes employee ID, but we want to
include the employee name in our target table to make our summary data easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as
gross sales per invoice or sales tax, but not the calculated value (such as net sales).
Update slowly changing dimension tables. We can use a Lookup transformation to determine
whether records already exist in the target.
Ans: The lookup table can be a single table, or we can join multiple tables in the same database
using a lookup query override. The Informatica Server queries the lookup table or an in-memory
cache of the table for all incoming rows into the Lookup transformation.
If your mapping includes heterogeneous joins, we can use any of the mapping sources or
mapping targets as the lookup table.
When we design our data warehouse, we need to decide what type of information to store in
targets. As part of our target table design, we need to determine whether to maintain all the
historic data or just the most recent changes.
The model we choose constitutes our update strategy, how to handle changes to existing
records.
Update strategy flags a record for update, insert, delete, or reject. We use this transformation
when we want to exert fine control over updates to a target, based on some condition we apply.
For example, we might use the Update Strategy transformation to flag all customer records for
update when the mailing address has changed, or flag all employee records for reject for people
no longer working for the company.
b) Expression transformation: You can use the Expression transformations to calculate values in
a single row before you write to the target. For example, you might need to adjust employee
salaries, concatenate first and last names, or convert strings to numbers. You can use the
Expression transformation to perform any non-aggregate calculations. You can also use the
Expression transformation to test conditional statements before you output the results to target
tables or other transformations.
c) Filter transformation: The Filter transformation provides the means for filtering rows in a
mapping. You pass all the rows from a source transformation through the Filter transformation,
and then enter a filter condition for the transformation. All ports in a Filter transformation are
input/output, and only rows that meet the condition pass through the Filter transformation.
d) Joiner transformation: While a Source Qualifier transformation can join data originating from a
common source database, the Joiner transformation joins two related heterogeneous sources
residing in different locations or file systems.
e) Lookup transformation: Use a Lookup transformation in your mapping to look up data in a
relational table, view, or synonym. Import a lookup definition from any relational database to
which both the Informatica Client and Server can connect. You can use multiple Lookup
transformations in a mapping.
The Informatica Server queries the lookup table based on the lookup ports in the transformation.
It compares Lookup transformation port values to lookup table column values based on the
lookup condition. Use the result of the lookup to pass to other transformations and the target.
What is a transformation?
A transformation is a repository object that generates, modifies, or passes data. You configure
logic in a transformation that the Informatica Server uses to transform data. The Designer
provides a set of transformations that perform specific functions. For example, an Aggregator
transformation performs calculations on groups of data.
Each transformation has rules for configuring and connecting in a mapping. For more information
about working with a specific transformation, refer to the chapter in this book that discusses that
particular transformation.
You can create transformations to use once in a mapping, or you can create reusable
transformations to use in multiple mappings.
When you use event-based scheduling, the Informatica Server starts a session when it locates
the specified indicator file. To use event-based scheduling, you need a shell command, script, or
batch file to create an indicator file when all sources are available. The file must be created or
sent to a directory local to the Informatica Server. The file can be of any format recognized by the
Informatica Server operating system. The Informatica Server deletes the indicator file once the
session starts.
Use the following syntax to ping the Informatica Server on a UNIX system:
pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}]
[hostname:]portno
Use the following syntax to start a session or batch on a UNIX system:
pmcmd start {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno
[folder_name:]{session_name | batch_name} [:pf=param_file] session_flag wait_flag
Use the following syntax to stop a session or batch on a UNIX system:
pmcmd stop {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno[folder_name:]{session_name | batch_name} session_flag
Use the following syntax to stop the Informatica Server on a UNIX system:
pmcmd stopserver {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno
The need to share data is just as pressing as the need to share metadata. Often, several data
marts in the same organization need the same information. For example, several data marts may
need to read the same product data from operational sources, perform the same profitability
calculations, and format this information to make it easy to review.
If each data mart reads, transforms, and writes this product data separately, the throughput for
the entire organization is lower than it could be. A more efficient approach would be to read,
transform, and write the data to one central data store shared by all data marts. Transformation is
a processing-intensive task, so performing the profitability calculations once saves time.
Therefore, this kind of dynamic data store (DDS) improves throughput at the level of the entire
organization, including all data marts. To improve performance further, you might want to capture
incremental changes to sources. For example, rather than reading all the product data each time
you update the DDS, you can improve performance by capturing only the inserts, deletes, and
updates that have occurred in the PRODUCTS table since the last time you updated the DDS.
The DDS has one additional advantage beyond performance: when you move data into the DDS,
you can format it in a standard fashion. For example, you can prune sensitive employee data that
should not be stored in any data mart. Or you can display date and time values in a standard
format. You can perform these and other data cleansing tasks when you move data into the DDS
instead of performing them repeatedly in separate data marts.
Detailed descriptions for database objects, flat files, Cobol files, or XML files to receive
transformed data. During a session, the Informatica Server writes the resulting data to session
targets. Use the Warehouse Designer tool in the Designer to import or create target definitions.
Detailed descriptions of database objects (tables, views, synonyms), flat files, XML files, or Cobol
files that provide source data. For example, a source definition might be the complete structure of
the EMPLOYEES table, including the table name, column names and datatypes, and any
constraints applied to these columns, such as NOT NULL or PRIMARY KEY. Use the Source
Analyzer tool in the Designer to import and create source definitions.
As mentioned, data in a warehouse comes from the transactions. Fact table in a data warehouse
consists of facts and/or measures. The nature of data in a fact table is usually numerical.
On the other hand, dimension table in a data warehouse contains fields used to describe the data
in fact tables. A dimension table can provide additional and descriptive information (dimension) of
the field of a fact table.
e.g. If I want to know the number of resources used for a task, my fact table will store the actual
measure (of resources) while my Dimension table will store the task and resource details.
Hence, the relation between a fact and dimension table is one to many.
When should you create the dynamic data store? Do you need a
DDS at all?
informatica: When should you create the dynamic data store? Do you need a DDS at all?
To decide whether you should create a dynamic data store (DDS), consider the following issues:
• How much data do you need to store in the DDS? The one principal advantage of data marts is
the selectivity of information included in it. Instead of a copy of everything potentially relevant
from the OLTP database and flat files, data marts contain only the information needed to answer
specific questions for a specific audience (for example, sales performance data used by the sales
division). A dynamic data store is a hybrid of the galactic warehouse and the individual data mart,
since it includes all the data needed for all the data marts it supplies. If the dynamic data store
contains nearly as much information as the OLTP source, you might not need the intermediate
step of the dynamic data store. However, if the dynamic data store includes substantially less
than all the data in the source databases and flat files, you should consider creating a DDS
staging area.
•
• What kind of standards do you need to enforce in your data marts? Creating a DDS is an
important technique in enforcing standards. If data marts depend on the DDS for information, you
can provide that data in the range and format you want everyone to use. For example, if you want
all data marts to include the same information on customers, you can put all the data needed for
this standard customer profile in the DDS. Any data mart that reads customer data from the DDS
should include all the information in this profile.
•
• How often do you update the contents of the DDS? If you plan to frequently update data in data
marts, you need to update the contents of the DDS at least as often as you update the individual
data marts that the DDS feeds. You may find it easier to read data directly from source databases
and flat file systems if it becomes burdensome to update the DDS fast enough to keep up with the
needs of individual data marts. Or, if particular data marts need updates significantly faster than
others, you can bypass the DDS for these fast update data marts.
•
• Is the data in the DDS simply a copy of data from source systems, or do you plan to reformat
this information before storing it in the DDS? One advantage of the dynamic data store is that, if
you plan on reformatting information in the same fashion for several data marts, you only need to
format it once for the dynamic data store. Part of this question is whether you keep the data
normalized when you copy it to the DDS.
•
• How often do you need to join data from different systems? On occasion, you may need to join
records queried from different databases or read from different flat file systems. The more
frequently you need to perform this type of heterogeneous join, the more advantageous it would
be to perform all such joins within the DDS, then make the results available to all data marts that
use the DDS as a source.
With PowerCenter, you receive all product functionality, including the ability to register multiple
servers, share metadata across repositories, and partition data.
A PowerCenter license lets you create a single repository that you can configure as a global
repository, the core component of a data warehouse.
PowerMart includes all features except distributed metadata, multiple registered servers, and
data partitioning. Also, the various options available with PowerCenter (such as PowerCenter
Integration Server for BW, PowerConnect for IBM DB2, PowerConnect for IBM MQSeries,
PowerConnect for SAP R/3, PowerConnect for Siebel, and PowerConnect for PeopleSoft) are not
available with PowerMart.
We can create shortcuts to objects in shared folders. Shortcuts provide the easiest way to reuse
objects. We use a shortcut as if it were the actual object, and when we make a change to the
original object, all shortcuts inherit the change.
Shortcuts to folders in the same repository are known as local shortcuts. Shortcuts to the global
repository are called global shortcuts.
Sessions and batches store information about how and when the Informatica Server moves data
through mappings. You create a session for each mapping you want to run. You can group
several sessions together in a batch. Use the Server Manager to create sessions and batches.
You can design a transformation to be reused in multiple mappings within a folder, a repository,
or a domain. Rather than recreate the same transformation each time, you can make the
transformation reusable, then add instances of the transformation to individual mappings. Use the
Transformation Developer tool in the Designer to create reusable transformations
What is a metadata?
Designing a data mart involves writing and storing a complex set of instructions. You need to
know where to get data (sources), how to change it, and where to write the information (targets).
PowerMart and PowerCenter call this set of instructions metadata. Each piece of metadata (for
example, the description of a source table in an operational database) can contain comments
about it.
In summary, Metadata can include information such as mappings describing how to transform
source data, sessions indicating when you want the Informatica Server to perform the
transformations, and connect strings for sources and targets.
What is ER Diagram
ER - Stands for entitity relationship diagrams. It is the first step in the design of data
model which will later lead to a physical database design of possible a OLTP or OLAP database
16. What is the exact difference between joiner and lookup transformation?
17. What is inline view?
19. Is it possible to execute work flows in different repositories at the same time using the same
informatica
21. How to parse characters using functions in the expression transformation. For example if a
column has character like mgr=a. I have to parse the character 'mgr='. Which function should I
use?
25. what s an ODS? what s the purpose of ODS?s that a logical database that stores extracted
data from source
27. We can insert or update the rows without using the update strategy. Then what is the
necessity of the update strategy?
29. What is the purpose of using UNIX commands in informatica. Which UNIX commands are
generally used with informatica.
A Materialized View is effectively a database table that contains the results of a query. The power
of materialized views comes from the fact that, once created, Oracle can automatically
synchronize a materialized view's data with its source information as required with little or no
programming effort.
• Denormalization
• Validation
• Data Warehousing
• Replication.
Starting with Oracle 8.1.5, introduced in March 1999, you can have a materialized view, also
known as a summary. Like a regular view, a materialized view can be used to build a black-box
abstraction for the programmer. In other words, the view might be created with a complicated
JOIN, or an expensive GROUP BY with sums and averages. With a regular view, this expensive
operation would be done every time you issued a query. With a materialized view, the expensive
operation is done when the view is created and thus an individual query need not involve
substantial computation.
Materialized views consume space because Oracle is keeping a copy of the data or at least a
copy of information derivable from the data. More importantly, a materialized view does not
contain up-to-the-minute information. When you query a regular view, your results includes
changes made up to the last committed transaction before your SELECT. When you query a
materialized view, you're getting results as of the time that the view was created or refreshed.
Note that Oracle lets you specify a refresh interval at which the materialized view will
automatically be refreshed.
At this point, you'd expect an experienced Oracle user to say "Hey, these aren't new. This is the
old CREATE SNAPSHOT facility that we used to keep semi-up-to-date copies of tables on
machines across the network!" What is new with materialized views is that you can create them
with the ENABLE QUERY REWRITE option. This authorizes the SQL parser to look at a query
involving aggregates or JOINs and go to the materialized view instead. Consider the following
query, from the ArsDigita Community System's /admin/users/registration-history.tcl page:
select
to_char(registration_date,'YYYYMM') as sort_key,
rtrim(to_char(registration_date,'Month')) as pretty_month,
to_char(registration_date,'YYYY') as pretty_year,
count(*) as n_new
from users
group by
to_char(registration_date,'YYYYMM'),
to_char(registration_date,'Month'),
to_char(registration_date,'YYYY')
order by 1;
select count(*)
from users
where rtrim(to_char(registration_date,'Month')) = 'January'
and to_char(registration_date,'YYYY') = '1999'
Oracle would ignore the users table altogether and pull information fromusers_by_month. This
would give the same result with much less work. Suppose that the current month is March 1998,
though. The query
select count(*)
from users
where rtrim(to_char(registration_date,'Month')) = 'March'
and to_char(registration_date,'YYYY') = '1998'
will also hit the materialized view rather than the users table and hence will miss anyone who
has registered since midnight (i.e., the query rewriting will cause a different result to be returned).
Suppose there are 100,000 rows in the source and 20,000 rows are loaded to target. Now in
between if the session stops after loading 20,000 rows how will you load the remaining rows?
So for your question,use Perform recovery to load the records from where the session fails.
Why is sorter an active transformation? What happens when you uncheck the DISTINCT option
in sorter?
Sorter is an active transformation since it eliminate the duplicate records when the
Get a related value-Get the Employee Name from Employee table based on the Employee
IDPerform Calculation.
Update slowly changing dimension tables - We can use unconnected lookup transformation to
file. The developer defines the lookup match criteria. There are two types of Lookups in
Assignment,Command,Control,Decision,Email,Event-Raise,Event-Wait,Timer,session
What is the method of loading 5 flat files of having same structure to a single target and
which transformations will you use?
This can be handled by using the file list in informatica. If we have 5 files in different
locations on the server and we need to load in to single target table. In session properties
/ftp_data/webrep/SrcFiles/abc.txt
/ftp_data/webrep/bcd.txt
/ftp_data/webrep/srcfilesforsessions/xyz.txt
/ftp_data/webrep/SrcFiles/uvw.txt
/ftp_data/webrep/pqr.txt
How do you identify existing rows of data in the target table using lookup transformation?
There are two ways to lookup the target table to verify a row exists or not :
1. Use connect dynamic cache lookup and then check the values of NewLookuprow Output port
to
decide whether the incoming record already exists in the table / cache or not.
2. Use Unconnected lookup and call it from an expression trasformation and check the Lookup
condition port value (Null/ Not Null) to decide whether the incoming record already exists
source qualifier is used to convert the source data type to Informatica readable format.
we can do mapping without source qualifier..in that case the datatypes of the source columns
What are the Tracking levels in Informatica transformations? Which one is efficient and
which one faster, and which one is best in Informatica Power Center 8.1/8.5?
I guess you are asking about the tracing level. When you configure a transformation, you
can set the amount of detail the Integration Service writes in the session log.Â
encountered, and skipped rows due to transformation row errors. Summarizes session results,
2.Terse: Integration Service logs initialization information and error messages and
additional initialization details, names of index and data files used, and detailed
transformation statistics.
4.Verbose Data: In addition to verbose initialization tracing, Integration Service logs each
row that passes into the mapping. Also notes where the Integration Service truncates string
data to fit the precision of a column and provides detailed transformation statistics.
Allows the Integration Service to write errors to both the session log and error log when
Unconnected lookup should be used when we need to call same lookup multiple times in one
mapping. For example, in a parent child relationship you need to pass mutiple child id's to
One can argue that this can be achieved by creating resusable lookup as well. Thats true,
but reusable components are created when the need is across mappings and not one mapping.
Also, if we use connected lookup multiple times in a mapping, by default the cache would be
persistent.
How do you handle error logic in Informatica? What are the transformations that you used
while handling errors? How did you reload those error records in target?
Columnindicator:
D -valid
o - overflow
n - null
t - truncate
When the data is with nulls, or overflow it will be rejected to write the data to the target
The reject data is stored on reject files. You can check the data and reload the data in to
You would not be able to track the changes done to the respective
mappings/sessions/workflows.
What is DTM buffer size, Default buffer blocksize. If any performance issue happens to
session, which one we have to increase and which one we have decrease.?
DTM buffer size is memory you allocate to DTM process (12 MB)
Buffer Block size is Size of heaviest Source/Target* number of rows that can be moved at a
Suppose if we say the target base commit as 1000, then informatica server will apply commit
if we say a source base commit for 1000, and due to tranformation logic suppose 500 rows are
dropped, then only 500 rows will insert into the target table, informatica server will apply
from the left hand side under the mapplet subfolder to the mapplet designer workspace it
won't allow you to do so, but if you try to drag and drop one mapplet to one mapping,i.e.,
in the mapping designer then it comes to the workspce.This means a mapplet can only be used
in a mapping but can't be used in another mapplet.That's why mapplet is known as the
For SQ Transformation when I am writing a custom Query, do I need to have all the From tables
as part of the mapping? That is say I have 3 from tables in the Custom Query, do I need to import
all 3 tables in the mapping. All 3 tables are from same database schema.
Please assist,
No Need to import all tables .. just take care of Field names,Fields order,
lengths and datatypes ..define the join condition properly between them as
part of custom query ...
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains
set of transformations and it allows us to reuse that transformation logic in multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The reusable
transformation is stored as a metadata separate from any other mapping that uses the
transformation. Whenever any changes to a reusable transformation are made, all the mappings
where the transformation is used will be invalidated.
Q. What is update strategy and what are the options for update strategy?
A. Informatica processes the source data row-by-row. By default every row is marked to be
inserted in the target table. If the row has to be updated/inserted based on some logic Update
Strategy transformation is used. The condition can be specified in Update Strategy to mark the
processed row for update or insert.
Following options are available for update strategy :
• DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent numeric
value of DD_INSERT is 0.
• DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent numeric
value of DD_UPDATE is 1.
• DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent numeric
value of DD_DELETE is 2.
• DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equivalent numeric
value of DD_REJECT is 3.
DW - is a way of storing data and creating information through leveraging data marts. DM's are
segments or categories of information and/or data that are grouped together to provide
'information' into that segment or category. DW does not require BI to work. Reporting tools can
generate reports from the DW.
Q. What are the data modeling tools you have used? (Polaris)
During the physical design process, you convert the data gathered during the logical design
phase into a description of the physical database, including tables and constraints.
A logical design is a conceptual and abstract design. We do not deal with the physical
implementation details yet; we deal only with defining the types of information that we need.
The process of logical design involves arranging data into a series of logical relationships called
entities and attributes.
An attribute is a component of an entity and helps define the uniqueness of the entity. In relational
databases, an attribute maps to a column.
Entity-Relationship.
ETL is the Data Warehouse acquisition processes of Extracting, Transforming (or Transporting)
and Loading (ETL) data from source systems into the data warehouse.
E.g. Oracle Warehouse Builder, Powermart.
Q. How do you extract data from different data sources explain with an example? (Polaris)
Q. What are the reporting tools you have used? What is the difference between them? (Polaris)
Q. Without using ETL tool can u prepare a Data Warehouse and maintain? (Polaris)
Data Mining is the process of automated extraction of predictive information from large
databases. It predicts future trends and finds behaviour that the experts may miss as it lies
beyond their expectations. Data Mining is part of a larger process called knowledge discovery;
specifically, the step in which advanced statistical analysis and modeling techniques are applied
to the data to find useful patterns and relationships.
Data mining can be defined as "a decision support process in which we search for patterns of
information in data." This search may be done just by the user, i.e. just by performing queries, in
which case it is quite hard and in most of the cases not comprehensive enough to reveal intricate
patterns. Data mining uses sophisticated statistical analysis and modeling techniques to uncover
such patterns and relationships hidden in organizational databases – patterns that ordinary
methods might miss. Once found, the information needs to be presented in a suitable form, with
graphs, reports, etc.
OLAP is software for manipulating multidimensional data from a variety of sources. The data is
often stored in data warehouse. OLAP software helps a user create queries, views,
representations and reports. OLAP tools can provide a "front-end" for a data-driven DSS.
On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts,
managers and executives to gain insight into data through fast, consistent, interactive access to a
wide variety of possible views of information that has been transformed from raw data to reflect
the real dimensionality of the enterprise as understood by the user.
OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated
enterprise data supporting end user analytical and navigational activities
Q. What are the Different types of OLAP's? What are their differences? (Mascot)
ROLAP, MOLAP and HOLAP are specialized OLAP (Online Analytical Analysis) applications.
ROLAP stands for Relational OLAP. Users see their data organized in cubes with dimensions,
but the data is really stored in a Relational Database (RDBMS) like Oracle. The RDBMS will store
data at a fine grain level, response times are usually slow.
MOLAP stands for Multidimensional OLAP. Users see their data organized in cubes with
dimensions, but the data is store in a Multi-dimensional database (MDBMS) like Oracle Express
Server. In a MOLAP system lot of queries have a finite answer and performance is usually critical
and fast.
HOLAP stands for Hybrid OLAP, it is a combination of both worlds. Seagate Software's “Holos” is
an example HOLAP environment. In a HOLAP system one will find queries on aggregated data
as well as on detailed data.
DOLAP
Q. What is the difference between data warehousing and OLAP?
The terms data warehousing and OLAP are often used interchangeably. As the definitions
suggest, warehousing refers to the organization and storage of data from a variety of sources so
that it can be analyzed and retrieved easily. OLAP deals with the software and the process of
analyzing data, managing aggregations, and partitioning information into cubes for in-depth
analysis, retrieval and visualization. Some vendors are replacing the term OLAP with the terms
analytical software and business intelligence.
Q. Many Suppliers – Many Products Model the above scenario in Erwin. How many tables and
what do they contain (Honeywell)
Q. Aggregate navigation
Q. How do I set the log level higher for more detailed information within Data Warehouse Center
7.2?
Within DWC, log level capability can be set from 0 to 4. There is a log level 5, yet it cannot be
turned on using the GUI, but must be turned on manually. A command line trace can be used for
any trace level, and this is the only way to turn on a level 5 trace:
Be sure to reset the trace level to 0 using the command line when you are done:
db2 => update iwh.configuration set value_int = 0 where name =
'TRACELVL'
and (component = '')
When you run a trace, the Data Warehouse Center writes information to text files. Data
Warehouse Center programs that are called from steps also write any trace information to this
directory. These files are located in the directory specified by the VWS_LOGGING environment
variable.
The Data Warehouse Center supports a wide variety of relational and non relational data
sources. You can populate your Data Warehouse Center warehouse with data from the following
databases and files:
Any DB2 family database
Oracle
Sybase
Informix
Microsoft SQL Server
IBM DataJoiner
Multiple Virtual Storage (OS/390), Virtual Machine (VM), and local area network (LAN) files
IMS and Virtual Storage Access Method (VSAM) (with Data Joiner Classic
Connect)
When you install the warehouse server, the warehouse control database that you specify during
installation is initialized. Initialization is the process in which the Data Warehouse Center creates
the control tables that are required to store Data Warehouse Center metadata. If you have more
than one warehouse control database, you can use the Data Warehouse Center -->
Control Database Management window to initialize the second warehouse control database.
However, only one warehouse control database can be active at a time.
Q. What databases need to be registered as system ODBC data sources for the Data Warehouse
Center?
Q. What is the aim/objective of having a data warehouse? And who needs a data warehouse? Or
what is the use of Data Warehousing? (Polaris)
Data warehousing technology comprises a set of new concepts and tools which support the
executives, managers and analysts with information material for decision making.
The fundamental reason for building a data warehouse is to improve the quality of information in
the organization.
The main goal of data warehouse is to report and present the information in a very user friendly
form.
Top Down Approach (Data warehousing first) , Bottom Up (data marts), Enterprise Data Model
( combines both)
A data warehouse system (DWS) comprises the data warehouse and all components used for
building, accessing and maintaining the DWH (illustrated in Figure 1). The center of a data
warehouse system is the data warehouse itself. The data import and preparation component is
responsible for data acquisition. It includes all programs, applications and legacy systems
interfaces that are responsible for extracting data from operational sources, preparing and loading
it into the warehouse. The access component includes all different applications (OLAP or data
mining applications) that make use of the information stored in the warehouse.
Additionally, a metadata management component (not shown in Figure 1) is responsible for the
management, definition and access of all different types of metadata. In general, metadata is
defined as “data about data” or “data describing the meaning of data”. In data warehousing, there
are various types of metadata, e.g., information about the operational sources, the structure and
semantics of the DWH data, the tasks performed during the construction, the maintenance and
access of a DWH, etc. The need for metadata is well known. Statements like “A data warehouse
without adequate metadata is like a filing cabinet stuffed with papers, but without any folders or
labels” characterize the situation. Thus, the quality of metadata and the resulting quality of
information gained using a data warehouse solution are tightly linked.
Implementing a concrete DWS is a complex task comprising two major phases. In the DWS
configuration phase, a conceptual view of the warehouse is first specified according to user
requirements (data warehouse design). Then, the involved data sources and the way data will be
extracted and loaded into the warehouse (data acquisition) is determined. Finally, decisions about
persistent storage of the warehouse using database technology and the various ways data will be
accessed during analysis are made.
After the initial load (the first load of the DWH according to the DWH configuration), during the
DWS operation phase, warehouse data must be regularly refreshed, i.e., modifications of
operational data since the last DWH refreshment must be propagated into the warehouse such
that data stored in the DWH reflect the state of the underlying operational systems. Besides DWH
refreshment, DWS operation includes further tasks like archiving and purging of DWH data or
DWH monitoring.
Data in a data warehouse is organized as subject oriented rather than application oriented. It is
designed and constructed as a non-volatile store of business data, transactions and events. Data
warehouse is a logically integrated store of data originating from disparate operational sources.
It is the only source for deriving information needed by the end users. Several temporal modeling
styles are usually used in different areas of the data warehouse.
Data in the DWH is integrated from various, heterogeneous operational systems (like database
systems, flat files, etc.) and further external data sources (like demographic and statistical
databases, WWW, etc.). Before the integration, structural and semantic differences have to be
reconciled, i.e., data have to be “homogenized” according to a uniform data model. Furthermore,
data values from operational systems have to be cleaned in order to get correct data into the data
warehouse.
The need to access historical data (i.e., histories of warehouse data over a prolonged period of
time) is one of the primary incentives for adopting the data warehouse approach. Historical data
are necessary for business trend analysis which can be expressed in terms of understanding the
differences between several views of the real-time data (e.g., profitability at the end of each
month). Maintaining historical data means that periodical snapshots of the corresponding
operational data are propagated and stored in the warehouse without overriding previous
warehouse states. However, the potential volume of historical data and the associated storage
costs must always be considered in relation to their potential business benefits.
Furthermore, warehouse data is mostly non-volatile, i.e., access to the DWH is typically read-
oriented. Modifications of the warehouse data takes place only when modifications of the source
data are propagated into the warehouse.
Finally, a data warehouse contains usually additional data, not explicitly stored in the operational
sources, but derived through some process from operational data (called also derived data). For
example, operational sales data could be stored in several aggregation levels (weekly, monthly,
quarterly sales) in the warehouse.
Data warehouses or a more focused database called a data mart should be considered when a
significant number of potential users are requesting access to a large amount of related historical
information for analysis and reporting purposes. So-called active or real-time data warehouses
can provide advanced decision support capabilities.
Q. Database administrators (DBAs) have always said that having non-normalized or de-
normalized data is bad. Why is de-normalized data now okay when it's used for Decision
Support?
Q. How often should data be loaded into a data warehouse from transaction processing and other
source systems?
It all depends on the needs of the users, how fast data changes and the volume of information
that is to be loaded into the data warehouse. It is common to schedule daily, weekly or monthly
dumps from operational data stores during periods of low activity (for example, at night or on
weekends). The longer the gap between loads, the longer the processing times for the load when
it does run. A technical IS/IT staffer should make some calculations and consult with potential
users to develop a schedule to load new data.
Some of the potential benefits of putting data into a data warehouse include:
1. Improving turnaround time for data access and reporting;
2. Standardizing data across the organization so there will be one view of the "truth";
3. Merging data from various source systems to create a more comprehensive information
source;
4. Lowering costs to create and distribute information and reports;
5. Sharing data and allowing others to access and analyze the data;
6. Encouraging and improving fact-based decision making.
The major limitations associated with data warehousing are related to user expectations, lack of
data and poor data quality. Building a data warehouse creates some unrealistic expectations that
need to be managed. A data warehouse doesn't meet all decision support needs. If needed data
is not currently collected, transaction systems need to be altered to collect the data. If data quality
is a problem, the problem should be corrected in the source system before the data warehouse is
built. Software can provide only limited support for cleaning and transforming data. Missing and
inaccurate data can not be "fixed" using software. Historical data can be collected manually,
coded and "fixed", but at some point source systems need to provide quality data that can be
loaded into the data warehouse without manual clerical intervention.
Build one! The easiest way to get started with data warehousing is to analyze some existing
transaction processing systems and see what type of historical trends and comparisons might be
interesting to examine to support decision making. See if there is a "real" user need for
integrating the data. If there is, then IS/IT staff can develop a data model for a new schema and
load it with some current data and start creating a decision support data store using a database
management system (DBMS). Find some software for query and reporting and build a decision
support interface that's easy to use. Although the initial data warehouse/data-driven DSS may
seem to meet only limited needs, it is a "first step". Start small and build more sophisticated
systems based upon experience and successes.
Production may reuse keys that it has purged but that you are still maintaining
Production might legitimately overwrite some part of a product description or a customer
description with new values but not change the product key or the customer key to a new value.
We might be wondering what to do about the revised attribute values (slowly changing dimension
crisis)
Production may generalize its key format to handle some new situation in the transaction system.
E.g. changing the production keys from integers to alphanumeric or may have 12-byte keys you
are used to have become 20-byte keys
Acquisition of companies
We can save substantial storage space with integer valued surrogate keys
Eliminate administrative surprises coming from production
Potentially adapt to big surprises like a merger or an acquisition
Have a flexible mechanism for handling slowly changing dimensions
There are two kinds of fact tables that do not have any facts at all.
The first type of factless fact table is a table that records an event. Many event-tracking tables in
dimensional data warehouses turn out to be factless.
E.g. A student tracking system that detects each student attendance event each day.
The second type of factless fact table is called a coverage table. Coverage tables are frequently
needed when a primary fact table in a dimensional data warehouse is sparse.
E.g. A sales fact table that records the sales of products in stores on particular days under each
promotion condition. The sales fact table does answer many interesting questions but cannot
answer questions about things that did not happen. For instance, it cannot answer the question,
“which products were in promotion that did not sell?” because it contains only the records of
products that did sell. In this case the coverage table comes to the rescue. A record is placed in
the coverage table for each product in each store that is on promotion in each time period.
A causal dimension is a kind of advisory dimension that should not change the fundamental grain
of a fact table.
E.g. why the customer bought the product? It can be due to promotion, sales etc.
Q What is Slicing and Dicing ? How we can do in Impromptu (We cannot do)? It is done only in
Powerplay.
GENERAL
Approximately 900GB.
Q. What is the daily data volume (in GB/records)? Or What is the size of the data extracted in the
extraction process? (Polaris)
Q. How many Fact and Dimension tables are there in your project?
Q. What is the size of Fact table in your project?
Q. How many dimension tables did you had in your project and name some dimensions
(columns)? (Mascot)
Q. How many Facts & Dimension Tables are there in your Project? (Mascot)
OLAP - Online Analytical processing, mainly required for DSS, data is in denormalized manner
and mainly used for non volatile data, highly indexed, improve query response time
OLTP - Transactional Processing - DML, highly normalized to reduce deadlock & increase
concurrency
Data warehouses can have many different types of life cycles with independent data marts. The
following is an example of a data warehouse life cycle.
In the life cycle of this example, four important steps are involved.
Extraction - As a first step, heterogeneous data from different online transaction processing
systems is extracted. This data becomes the data source for the data warehouse.
Cleansing/transformation - The source data is sent into the populating systems where the data is
cleansed, integrated, consolidated, secured and stored in the corporate or central data
warehouse.
Distribution - From the central data warehouse, data is distributed to independent data marts
specifically designed for the end user.
Analysis - From these data marts, data is sent to the end users who access the data stored in the
data mart depending upon their requirement.
Q. What are the different Reporting and ETL tools available in the market?
A data warehouse is a database designed to support a broad range of decision tasks in a specific
organization. It is usually batch updated and structured for rapid online queries and managerial
summaries. Data warehouses contain large amounts of historical data which are derived from
transaction data, but it can include data from other sources also. It is designed for query and
analysis rather than for transaction processing.
The term data warehousing is often used to describe the process of creating, managing and
using a data warehouse.
A data mart is a selected part of the data warehouse which supports specific decision support
application requirements of a company’s department or geographical region. It usually contains
simple replicates of warehouse partitions or data that has been further summarized or derived
from base warehouse data. Instead of running ad hoc queries against a huge data warehouse,
data marts allow the efficient execution of predicted queries over a significantly smaller database.
Q. How do I differentiate between a data warehouse and a data mart? (KPIT Infotech Pune,
Mascot)
A data warehouse is for very large databases (VLDBs) and a data mart is for smaller databases.
The difference lies in the scope of the things with which they deal.
A data mart is an implementation of a data warehouse with a small and more tightly restricted
scope of data and data warehouse functions. A data mart serves a single department or part of
an organization. In other words, the scope of a data mart is smaller than the data warehouse. It is
a data warehouse for a smaller group of end
The star schema and OLAP cube are intimately related. Star schemas are most appropriate for
very large data sets. OLAP cubes are most appropriate for smaller data sets where analytic tools
can perform complex data comparisons and calculations. In almost all OLAP cube environments,
it’s recommended that you originally source data into a star schema structure, and then use
wizards to transform the data into the OLAP cube.
Compared to entity/relation modeling, it's less rigorous (allowing the designer more discretion in
organizing the tables) but more practical because it accommodates database complexity and
improves performance.
A fact table in a pure star schema consists of multiple foreign keys, each paired with a primary
key in a dimension, together with the facts containing the measurements.
Every foreign key in the fact table has a match to a unique primary key in the respective
dimension (referential integrity). This allows the dimension table to possess primary keys that
aren’t found in the fact table. Therefore, a product dimension table might be paired with a sales
fact table in which some of the products are never sold.
Dimensional models are full-fledged relational models, where the fact table is in third normal form
and the dimension tables are in second normal form.
The main difference between second and third normal form is that repeated entries are removed
from a second normal form table and placed in their own “snowflake”. Thus the act of removing
the context from a fact record and creating dimension tables places the fact table in third normal
form.
The fact tables are mostly very huge and almost never fetch a single record into our answer set.
We fetch a very large number of records on which we then do, adding, counting, averaging, or
taking the min or max. The most common of them is adding. Applications are simpler if they store
facts in an additive format as often as possible. Thus, in the grocery example, we don’t need to
store the unit price. We compute the unit price by dividing the dollar sales by the unit sales
whenever necessary.
Some facts, like bank balances and inventory levels, represent intensities that are awkward to
express in an additive format. We can treat these semi additive facts as if they were additive – but
just before presenting the results to the end user; divide the answer by the number of time
periods to get the right result. This technique is called averaging over time.
When the enterprise decides to create a set of common labels across all the sources of data, the
separate data mart teams (or, single centralized team) must sit down to create master
dimensions that everyone will use for every data source. These master dimensions are called
Conformed Dimensions.
Two dimensions are conformed if the fields that you use as row headers have the same domain.
Q. What is a Conformed Fact?
If the definitions of measurements (facts) are highly consistent, we call them as Conformed Facts.
We often call a row header a “grouping column” because everything in the list that’s not
aggregated with an operator such as SUM must be mentioned in the SQL GROUP BY clause. So
the GROUP BY clause in the second query reads, GROUP BY MANUFACTURER, BRAND.
Drilling Across adds more data to an existing row. If drilling down is requesting ever finer and
granular data from the same fact table, then drilling across is the process fo linking two or more
fact tables at the same granularity, or, in other words, tables with the same set of grouping
columns and dimensional constraints.
A drill across report can be created by using grouping columns that apply to all the fact tables
used in the report.
The new fact table called for in the drill-across operation must share certain dimensions with the
fact table in the original query. All fact tables in a drill-across query must use conformed
dimensions.
If drilling down is adding grouping columns from the dimension tables, then drilling up is
subtracting grouping columns.
Q. What is meant by Drilling Around?
The final variant of drilling is drilling around a value circle. This is similar to the linear value chain
that I showed in the previous example, but occurs in a data warehouse where the related fact
tables that share common dimensions are not arranged i n a linear order. The best example is
from health care, where as many as 10 separate entities are processing patient encounters, and
are sharing this information with one another.
E.g. a typical health care value circle with 10 separate entities surrounding the patient.
When the common dimensions are conformed and the requested grouping columns are drawn
from dimensions that tie to all the fact tables in a given report, you can generate really powerful
drill around reports by performing separate queries on each fa ct table and outer joining the
answer sets in the client tool.
Time_key
Day_of_week
Day_number_in_month
Day_number_overall
Month
Month_number_overall
Quarter
Fiscal_period
Season
Holiday_flag
Weekday_flag
Last_day_in_month_flag
The tiem stamp in a fact table should be a surrogate key instead of a real date because:
the rare timestamp that is inapplicable, corrupted, or hasn’t happened yet needs a value that
cannot be a real date
most end-user calendar navigation constraints, such as fiscal periods, end-of-periods, holidays,
day numbers and week numbers aren’t supported by database timestamps
integer time keys take up much less disk space than full dates
Q. Why have more than one fact table instead of a single fact table?
We cannot combine all of the business processes into a single fact table because:
the separate fact tables in the value chain do not share all the dimensions. You simply can’t put
the customer ship to dimension on the finished goods inventory data
each fact table possesses different facts, and the fact table records are recorded at different
tiems along the alue chain
Q. What is mean by Slowly Changing Dimensions and what are the different types of SCD’s?
(Mascot)
Dimensions don’t change in predicable ways. Individual customers and products evolve slowly
and episodically. Some of the changes are true physical changes. Customers change their
addresses because they move. A product is manufactured with different packaging. Other
changes are actually corrections of mistakes in the data. And finally, some changes are changes
in how we label a product or customer and are more a matter of opinion than physical reality. We
call these variations Slowly Changing Dimension (SCD).
The 3 fundamental choices for handling the slowly changing dimension are:
A Type 2 SCD creates a new dimension record and requires a generalized or surrogate key for
the dimension. We create surrogate keys when a true physical change occurs in a dimension
entity at a specific point in time, such as the customer address change or the product packing
change. We often add a timestamp and a reason code in the dimension record to precisely
describe the change.
The Type 2 SCD records changes of values of dimensional entity attributes over time. The
technique requires adding a new row to the dimension each time there’s a change in the value of
an attribute (or group of attributes) and assigning a unique surrogate key to the new row.
A Type 3 SCD adds a new field in the dimension record but does not create a new record. We
might change the designation of the customer’s sales territory because we redraw the sales
territory map, or we arbitrarily change the category of the product from confectionary to candy. In
both cases, we augment the original dimension attribute with an “old” attribute so we can switch
between these alternate realities.
Overwriting
Creating another dimension record
Creating a current value filed
A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key. It is
just a unique identifier or number for each row that can be used for the primary key to the table.
It is useful because the natural primary key (i.e. Customer Number in Customer table) can
change and this makes updates more difficult.
Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the
primary keys (according to the business users) but ,not only can these change, indexing on a
numerical value is probably better and you could consider creating a surrogate key called, say,
AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may
display only the AIRPORT_NAME.
Another benefit you can get from surrogate keys (SID) is in Tracking the SCD - Slowly Changing
Dimension.
A classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be
in your Employee Dimension). This employee has a turnover allocated to him on the Business
Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to
Business Unit 'BU2.' All the new turnover has to belong to the new Business Unit 'BU2' but the
old one should Belong to the Business Unit 'BU1.'
If you used the natural business key 'E1' for your employee within your data warehouse
everything would be allocated to Business Unit 'BU2' even what actually belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new record for the Employee
'E1' in your Employee Dimension with a new surrogate key.
This way, in your fact table, you have your old data (before 2nd of June) with the SID of the
Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' +
'BU2.'
You could consider Slowly Changing Dimension as an enlargement of your natural key: natural
key of the Employee was Employee Code 'E1' but for you it becomes
so you need another id.Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the
difference with the natural key enlargement process is that you might not have all part of your
new key within your fact table, so you might not be able to do the join on the new enlarge key
Every join between dimension tables and fact tables in a data warehouse environment should be
based on surrogate key, not natural keys.
If there are 10000 records and while loading, if the session fails in between, how will you load the
remaining data?
Answer: Session performance can be improved by allocating the cache memory in a way that it
can execute all the transformation within that cache size.
Only those transformations are considered as a bottleneck for performance which uses CACHE.
Likewise use the cache calculator for calculating the cache size of all the transformation(which
uses chache)
2) Joiner - 5MB
3) Look-up - 7MB
So minimum of 18MB must be allocated to the session cache. If we are allocating less memory to
the session than the Integration service might fail the session itself.
Answer:If you are asking about the tracing level. When you configure a
transformation you can set the amount of detail the Integration Service writes
in the session log. PowerCenter 8.x supports 4 types of tracing level:
Allows the Integration Service to write errors to both the session log and error
log when you enable row error logging.
When you configure the tracing level to verbose data the Integration Service
writes row data for all rows in a block when it processes a transformation.
Explain Why it is bad practice to place a Transaction Control transformation upstream from a SQL
transformation?
Why the input pipe lines to the joiner should not contain an update strategy transformation?
Answer: Update Strategy flags each row for either Insert Update Delete or Reject. I think when
you use it before Joiner Joiner drops all the flagging details.
This is a curious question though but I can not imagine how would one expect to deal with the
scenario in which both the joiner transformation incoming pipelines have update strategy. In that
scenario it would make it really complicated to join the rows flagged for different database
operations (Update Insert Delete) and then decide which operation to perform. To avoid this I
think Informatica prohibited Update Strategy Transformation to be used before Joiner
transformation.
Router is passive transformation, but one may argue that it is passive because in case if we use
default group (only) then there is no change in number of rows. What explanation will you give?
Answer: Update Strategy can be used whenever we need to update some existing value in
database. For an update strategy we need to have a primary key.
Answer:Unconnected lookup should be used when we need to call same lookup multiple times in
one mapping. For example, in a parent child relationship you need to pass mutiple child id's to get
respective parent id's.
One can argue that this can be achieved by creating resusable lookup as well. Thats true, but
reusable components are created when the need is across mappings and not one mapping. Also,
if we use connected lookup multiple times in a mapping, by default the cache would be persistent.
Answer:Dependency problems means when we run a process the process output is input to other
process. Then first process is stopped then it causes problem or stop running other process. One
process is depending on other other. If one process get effected then other process effect. This is
called problem dependency.
How do you handle error logic in Informatica? What are the transformations that you used while
handling errors? How did you reload those error records in target?
Columnindicator:
D -valid
o - overflow
n - null
t - truncate
When the data is with nulls, or overflow it will be rejected to write the data to the target
The reject data is stored on reject files. You can check the data and reload the data in to the
target using reject reload utility.
1.What is Datadriven?
Answer: The informatica server follows instructions coded into update strategy transformations
with in the session maping determine how to flag records for insert, update, delete or
reject. If you do not choose data driven option setting,the informatica server ignores all update
strategy transformations in the mapping.
--------------------------------------------------------------------------------
If the data driven option is selected in the session properties,it follows the instructions in the
update strategy
transformation in the mapping o.w it follows instuctions specified in the session.
You can use nested IIF statements to test multiple conditions. The following example tests for
various conditions and returns 0 if sales is zero or negative:
IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2, IIF( SALES < 200,
SALARY3, BONUS))), 0 )
You can use DECODE instead of IIF in many cases. DECODE may improve readability. The
following shows how you can use DECODE instead of IIF :
Decode function can used in sql statement. where as if statment cant use with SQL statement.
Features of Informatica 8
The architecture of Power Center 8 has changed a lot;
1. PC8 is service-oriented for modularity, scalability and flexibility.
2. The Repository Service and Integration Service (as replacement for Rep Server and
Informatica Server) can be run on different
computers in a network (so called nodes), even redundantly.
3. Management is centralized, that means services can be started and stopped on nodes via a
central web interface.
4. Client Tools access the repository via that centralized machine, resources are distributed
dynamically.
5. Running all services on one machine is still possible, of course.
6. It has a support for unstructured data which includes spreadsheets, email, Microsoft Word files,
presentations and .PDF documents. It provides high availability, seamless fail over, eliminating
single points of failure.
7. It has added performance improvements (To bump up systems performance, Informatica has
added "push down optimization" which moves data transformation processing to the native
relational database I/O engine whenever its is most appropriate.)
8. Informatica has now added more tightly integrated data profiling, cleansing, and matching
capabilities.
9. Informatica has added a new web based administrative console.
10.Ability to write a Custom Transformation in C++ or Java.
11.Midstream SQL transformation has been added in 8.1.1, not in 8.1.
12.Dynamic configuration of caches and partitioning
13.Java transformation is introduced.
14.User defined functions
15.PowerCenter 8 release has "Append to Target file" feature.
Its a session option. when the informatica server performs incremental aggr. it passes new
source data through the mapping and uses historical chache data to perform new aggregation
caluculations incrementaly. for performance we will use it.
Go to the session log file there we will find the information regarding to the
errors encountered.
load summary.
so by seeing the errors encountered during the session running, we can resolve the errors.
--------------------------------------------------------------------------------
There is one file called the bad file which generally has the format as *.bad and it contains the
records rejected by informatica server. There are two parameters one fort the types of row and
other for the types of columns. The row indicators signifies what operation is going to take place
( i.e. insertion, deletion, updation etc.). The column indicators contain information regarding why
the column has been rejected.( such as violation of not null constraint, value error, overflow etc.)
If one rectifies the error in the data preesent in the bad file and then reloads the data in the
target,then the table will contain only valid data.
Answer: No you can not. If you want to start batch that resides in a batch,create a new
independent batch and copy the necessary sessions into the new batch.