You are on page 1of 9

INFORMATICA What is an ETL?

If one day company requires consolidated reports of assets, there are two ways: First completely manual, generate different reports from different systems and integrate them. Second fetch all the data from different systems/applications, make a Data Warehouse, and generate reports as per the requirement. Obviously second approach is going to be the best. Now to fetch the data from different systems, making it coherent and loading into a Data Warehouse requires some kind of extraction, cleansing, integration, and load. ETL stands for Extraction, Transformation & Load. ETL Tools provide facility to Extract data from different non-coherent systems, cleanse it, merge it and load into target systems. What is Informatica? Informatica is a tool, supporting all the steps of Extraction, Transformation and Load process. Now a days Informatica is also being used as an Integration tool. Informatica is an easy to use tool. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for Data extraction transformation and load. These process flow diagrams are known as mappings. Once a mapping is made, it can be scheduled to run as and when required. In the background Informatica server takes care of fetching data from source, transforming it, & loading it to the target systems/databases. Informatica can communicate with all major data sources (mainframe/RDBMS/Flat Files/XML/VSM/SAP etc), can move/transform data between them. It can move huge volumes of data in a very effective way, many a times better than even bespoke programs written for specific data movement only. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log). It can effectively join data from two distinct data sources (even an xml file can be joined with a relational table). In all, Informatica has got the ability to effectively integrate heterogeneous data sources & converting raw data into useful information.

Informatica Product Line:


Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of enterprise data integration software and ETL softwares. The important products provided by Informatica Corporation are provided below: Power Center Power Mart Power Exchange Power Center Connect Power Channel By Subhasish Das

Metadata Exchange Power Analyzer Super Glue Power Center & Power Mart: Power Mart is a departmental version of Informatica for building, deploying, and managing data warehouses and data marts. Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like data marts. Power Center supports global repositories and networked repositories and it can be connected to several sources. Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Power Mart can extensively grow to an enterprise implementation and it is easy for developer productivity through a codeless environment. Power Exchange: Informatica Power Exchange as a stand alone service or along with Power Center, helps organizations leverage data by avoiding manual coding of data extraction programs. Power Exchange supports batch, real time and changed data capture options in main frame(DB2, VSAM, IMS etc.,), mid range (AS400 DB2 etc.,), and for relational databases (oracle, Sql server, db2 etc) and flat files in UNIX, Linux and windows systems. Power Center Connect: This is add on to Informatica Power Center. It helps to extract data and metadata from ERP systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications. Power Channel: This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through Firewalls, transfer files over FTP, etc. Meta Data Exchange: Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data structures within their IT environment when ) used with Power Center. For example, an organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data models. Functional and technical team should have spent much time and effort in creating the data model's data structures (tables, columns, data types, procedures, functions, triggers etc). By using Meta data exchange, these data structures can be imported into power center to identify source and target mappings which leverages time and effort. There is no need for Informatica developer to create these data structures once again. Power Analyzer: Power Analyzer provides organizations with reporting facilities. Power Analyzer makes accessing, analyzing, and sharing enterprise data simple and easily available to decision makers. Power Analyzer enables to gain insight into business processes and develop business intelligence. With Power Analyzer, an organization can extract, filter, format, and analyze corporate information from data stored in a data warehouse, data mart, operational data store, or other data storage models. Power Analyzer is best with a dimensional data warehouse in a relational database. It can also run reports on data in any table in a relational database that do not conform to the dimensional model. Super Glue: Superglue is used for loading metadata in a centralized place from several sources. Reports can be run against this superglue to analyze Meta data.

Informatica Transformations:
A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. By Subhasish Das

Transformations can be of two types: Active Transformation: An active transformation can change the number of rows that pass through the transformation, change the transaction boundary, can change the row type. For example, Filter, Transaction Control and Update Strategy are active transformations. The key point is to note that Designer does not allow you to connect multiple active transformations or an active and a passive transformation to the same downstream transformation or transformation input group because the Integration Service may not be able to concatenate the rows passed by active transformations However, Sequence Generator transformation (SGT) is an exception to this rule. A SGT does not receive data. It generates unique numeric values. As a result, the Integration Service does not encounter problems concatenating rows passed by a SGT and an active transformation. Passive Transformation. A passive transformation does not change the number of rows that pass through it, maintains the transaction boundary, and maintains the row type. The key point is to note that Designer allows you to connect multiple transformations to the same downstream transformation or transformation input group only if all transformations in the upstream branches are passive. The transformation that originates the branch can be active or passive. Transformations can be Connected or Unconnected to the data flow. Connected Transformation Connected transformation is connected to other transformations or directly to target table in the mapping. Unconnected Transformation An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.

Informatica Transformations Following are the list of Transformations available in Informatica: Aggregator Transformation Application Source Qualifier Transformation Custom Transformation Data Masking Transformation Expression Transformation External Procedure Transformation Filter Transformation HTTP Transformation Input Transformation Java Transformation By Subhasish Das

Joiner Transformation Lookup Transformation Normalized Transformation Output Transformation Rank Transformation Reusable Transformation Router Transformation Sequence Generator Transformation Sorter Transformation Source Qualifier Transformation SQL Transformation Stored Procedure Transformation Transaction Control Transaction Union Transformation Unstructured Data Transformation Update Strategy Transformation XML Generator Transformation XML Parser Transformation XML Source Qualifier Transformation External Transformation The above Informatica Transformations and their significances in the ETL process are discussed below in detail: Aggregator Transformation: Aggregator transformation performs aggregate functions like average, sum, count etc. on multiple rows or groups. The Integration Service performs these calculations as it reads and stores data group and row data in an aggregate cache. It is an Active & Connected transformation. Difference b/w Aggregator and Expression Transformation? Expression transformation permits you to perform calculations row by row basis only. In Aggregator you can perform calculations on groups.
)

Advanced External Procedure Transformation

Aggregator transformation has following ports State, State_Count, Previous_State andState_Counter.


Components: Aggregate Cache, Aggregate Expression, Group by port, Sorted input. Aggregate Expressions: are allowed only in aggregate transformations. can include conditional clauses and non-aggregate functions. can also include one aggregate function nested into another aggregate function. Aggregate Functions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE

By Subhasish Das

Custom Transformation: It works with procedures you create outside the designer interface to extend PowerCenter functionality. Calls a procedure from a shared library or DLL. It is active/passive & connected type. You can use CT to create T. that require multiple input groups and multiple output groups. Custom transformation allows you to develop the transformation logic in a procedure. Some of the PowerCenter transformations are built using the Custom transformation. Rules that apply to Custom transformations, such as blocking rules, also apply to transformations built using Custom transformations. PowerCenter provides two sets of functions called generated and API functions. The Integration Service uses generated functions to interface with the procedure. When you create a Custom transformation and generate the source code files, the Designer includes the generated functions in the files. Use the API functions in the procedure code to develop the transformation logic. Difference between Custom and External Procedure Transformation? In Custom T, input and output functions occur separately. The Integration Service passes the input data to the procedure using an input function. The output function is a separate function that you must enter in the procedure code to pass output data to the Integration Service. In contrast, in the External Procedure transformation, an external procedure function does both input and output and its parameters consist of all the ports of the transformation. Data Masking Transformation: Passive & Connected. It is used to change sensitive production data to realistic test data for non production environments. It creates masked data for development, testing, training and data mining. Data relationship and referential integrity are maintained in the masked data. For example: It returns masked value that has a realistic format for SSN, Credit card number, birthdates, phone number, etc. But is )not a valid value. Masking types: Key Masking, Random Masking, Expression Masking, Special Mask format. Default is no masking. Expression Transformation: Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a single row. Example: to calculate discount of each product or to concatenate first and last names or to convert date to a string field. You can create an Expression transformation in the Transformation Developer or the Mapping Designer. Components: Transformation, Ports, Properties, Metadata Extensions. External Procedure: Passive & Connected or Unconnected. It works with procedures you create outside of the Designer interface to extend PowerCenter functionality. You can create complex functions within a DLL or in the COM layer of windows and bind it to external procedure transformation. To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation interface built into PowerCenter. You must be an experienced programmer to use TX and use multi-threaded code in external procedures. Filter Transformation: Active & Connected. It allows rows that meet the specified filter condition and removes the rows that do not meet the condition. For example, to find all the employees who are working in New York or to find out all the faculty member teaching Chemistry in a state. The input ports for the filter must come from a single transformation. You cannot concatenate ports from more than one transformation into the Filter transformation. Components: Transformation, Ports, Properties, Metadata Extensions.

By Subhasish Das

HTTP Transformation: Passive & Connected. It allows you to connect to an HTTP server to use its services and applications. With an HTTP transformation, the Integration Service connects to the HTTP server, and issues a request to retrieves data or posts data to the target or downstream transformation in the mapping. Authentication types: Basic, Digest and NTLM. Examples: GET, POST and SIMPLE POST. Java Transformation: Active or Passive & Connected. It provides a simple native programming interface to define transformation functionality with the Java programming language. You can use the Java transformation to quickly define simple or moderately complex transformation functionality without advanced knowledge of the Java programming language or an external Java development environment. Joiner Transformation: Active & Connected. It is used to join data from two related heterogeneous sources residing in different locations or to join data from the same source. In order to join two sources, there must be at least one or more pairs of matching column between the sources and a must to specify one source as master and the other as detail. For example: to join a flat file and a relational source or to join two flat files or to join a relational source and a XML source. Lookup Transformation: Passive & Connected or Unconnected. It is used to look up data in a flat file, relational table, view, or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. )Later returned values can be passed to other transformations. You can create a lookup definition from a source qualifier and can also use multiple Lookup transformations in a mapping. You can perform the following tasks with a Lookup transformation: *Get a related value. Retrieve a value from the lookup table based on a value in the source. For example, the source has an employee ID. Retrieve the employee name from the lookup table. *Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For example, retrieve a sales tax percentage, calculate a tax, and return the tax to a target. *Update slowly changing dimension tables. Determine whether rows exist in a target. Lookup Components: Lookup source, Ports, Properties, Condition. Types of Lookup: 1) Relational or flat file lookup. 2) Pipeline lookup. 3) Cached or un-cached lookup. 4) Connected or unconnected lookup. Normalizer Transformation: Active & Connected. The Normalizer transformation processes multiple-occurring columns or multiple-occurring groups of columns in each source row and returns a row for each instance of the multiple-occurring data. It is used mainly with COBOL sources where most of the time data is stored in de-normalized format. You can create following Normalizer transformation: *VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. VSAM stands for Virtual Storage Access Method, a file access method for IBM mainframe. *Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files. This is default when you create a normalizer transformation. Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions.

By Subhasish Das

Rank Transformation: Active & Connected. It is used to select the top or bottom rank of data. You can use it to return the largest or smallest numeric value in a port or group or to return the strings at the top or the bottom of a session sort order. For example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products. As an active transformation, it might change the number of rows passed through it. Like if you pass 100 rows to the Rank transformation, but select to rank only the top 10 rows, passing from the Rank transformation to another transformation. You can connect ports from only one transformation to the Rank transformation. You can also create local variables and write non-aggregate expressions. Router Transformation: Active & Connected. It is similar to filter transformation because both allow you to apply a condition to test data. The only difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not meet the condition and route it to a default output group. If you need to test the same input data based on multiple conditions, use a Router transformation in a mapping instead of creating multiple Filter transformations to perform the same task. The Router transformation is more efficient. Sequence Generator Transformation: Passive & Connected transformation. It is used to create unique primary key values or cycle through a sequential range of numbers or to replace missing primary keys. It has two output ports: NEXTVAL and CURRVAL. You cannot edit or delete these ports. Likewise, you cannot add ports to the transformation. NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value ) plus one or NEXTVAL plus the Increment By value. You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you perform multiple loads to a single target. For non-reusable Sequence Generator transformations, Number of Cached Values is set to zero by default, and the Integration Service does not cache values during the session. For non-reusable Sequence Generator transformations, setting Number of Cached Values greater than zero can increase the number of times the Integration Service accesses the repository during the session. It also causes sections of skipped values since unused cached values are discarded at the end of each session. Sorter Transformation: Active & Connected transformation. It is used sort data either in ascending or descending order according to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct. When you create a Sorter transformation in a mapping, you specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order. Source Qualifier Transformation: Active & Connected transformation. When adding a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier is used to join data originating from the same source database, filter rows when the Integration Service reads source data, Specify an outer join rather than the default inner join and to specify sorted ports. It is also used to select only distinct values from the source and to create a custom query to issue a special SELECT statement for the Integration Service to read source data By Subhasish Das

SQL Transformation: Active/Passive & Connected transformation. The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors. Stored Procedure Transformation Passive & Connected or Unconnected transformation. It is useful to automate timeconsuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc. The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements. Transaction Control Transformation Active & Connected. You can control commit and roll back of transactions based on a set of rows that pass through a Transaction Control transformation. Transaction control can be defined within a mapping or within a session. Components: Transformation, Ports, Properties, Metadata Extensions. Union Transformation Active & Connected. The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines )or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows. Rules 1) you can create multiple input groups, but only one output group. 2) All input groups and the output group must have matching ports. The precision, data type, and scale must be identical across all groups. 3) The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add another transformation such as a Router or Filter transformation. 4) You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation. 5) The Union transformation does not generate transactions. Components: Transformation tab, Properties tab, Groups tab, Group Ports tab. Unstructured Data Transformation: Active/Passive and connected. The Unstructured Data transformation is a transformation that processes unstructured and semi-structured file formats, such as messaging formats, HTML pages and PDF documents. It also transforms structured formats such as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT. Components: Transformation, Properties, UDT Settings, UDT Ports, Relational Hierarchy. Update Strategy Transformation: Active & Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes. It flags rows for insert, update, delete or reject within a mapping. ML Generator Transformation: Active & Connected transformation. It lets you create XML inside a pipeline. The XML Generator transformation accepts data from multiple ports and writes XML through a single output port. By Subhasish Das

XML Parser Transformation: Active & Connected transformation. The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases. The XML Parser transformation functionality is similar to the XML source functionality, except it parses the XML in the pipeline. XML Source Qualifier Transformation: Active & Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources. has one input or output port for every column in the XML source. External Procedure Transformation: Active & Connected/Unconnected transformation. Sometimes, the standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping. Advanced External Procedure Transformation: Active & Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/Power Mart functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.

Benefits of using Informatica :


To conclude in sum total we can illustrate the benefits of Informatica which are as follows Provide the Right Information, at the Right Time Gain universal, right-time data accessthat is, batch, near real-time, and real-time access Deliver trusted, timely data throughout the enterprise to meet both analytical and operational needs Improve confidence in data through enterprise-wide visibility into data definitions, lineage, and relationships, and through increased data accuracy and consistency Answer the questions your business has about its data, and supply the business with the high-quality data it needs, when it needs it, to make better and timelier decisions

By Subhasish Das