Professional Documents
Culture Documents
10.1.3 Lesson
Rev 2.2
03/03/2008
Authors FX Nicolas Christophe Dupupet Craig Stewart Main Contributors/Reviewers Nick Malfroy Julien Testut Matt Dahlman Richard Soule Bryan Wise Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 USA Worldwide inquiries: Phone: +1 650 506 7000 Fax: +1 650 506 7200 www.oracle.com Oracle is the information company Oracle is a registered trademark of Oracle Corporation. Various product and service names referenced herein may be trademarks of Oracle Corporation. All other product and service names mentioned may be trademarks of their respective owners. Copyright 2008 Oracle Corporation All rights reserved.
Rev 2.2
03/03/2008
Introduction
Objectives
After completing this training, you should:
Have a clear understanding of the ODI architecture Have a clear understanding of the ODI differentiators Have some experience in developing with ODI Be ready for your first live projects
Before We Start
Please copy and unzip the VM ware image on your machine
Lessons
General Information
Overview of the product, sales tactics, positioning Architecture
Additional Features
Data Profiling Data Quality Versioning
Methodology
Install the GUI Create the repositories..... Define Users and Profiles. Define the IS architecture Physical and logical view. Reverse-engineering of the meta-data Table, Views, Synonyms definitions Constraints. Definition of the elementary transformations Which are the targets? Which are the sources for each target? Define transformation rules and control rules Define the transfer rules... Unitary tests Understand the outcome Debugging . Optimize stategies Knowledge Modules.. Define the sequencing Order the interfaces Integration tests Scenarios generation Defining the scheduling Agents configuration Execution frequency. Packaging / delivery Freeze the version Deliver the scenarios Operations..
Install Security Topology Designer Designer Model Def. Project/ Interface Operator Project/KM Project/Pkg Agents/Scen
Project/Scen.
Operator
1
1-1
Objectives
After completing this lesson, you should be able to describe:
The scope of Data Integration for batch and near real time integration The difference between ODI ELT and other ETL tools on the market for batch approaches General overview of the ODI architecture, and how it combines ELT and SOA in the same product architecture
1-2
Data Integration
Migration Data Warehousing Master Data Management Data Synchronization --------Federation Real Time Messaging
HAVE
Data in Disparate Sources
---
---------------
---------------
---------------
Legacy
ERP
CRM
Best-of-breed Applications
1-3
1.
2.
Convergence of integration solutions Shift from custom coding to declarative design Shift to pattern-driven development
3.
4.
1-4
1-5
Runtime
Knowledge Modules
Data Flow
Metadata repository
Pluggable on many RDBMS Ready for deployment Modular and extensible metadata
Metadata Management
Master Repository Work Repositories Runtime Repositories
1-6
Development Servers and Applications Execution Agent Data Flow Conductor Return Codes
CRM Data Warehouse
Legacy ERP
Production
ODI Runtime Environment User Interfaces Topology/Security Administrators Execution Log Operators Runtime Repository Code Execution Log Agent Data Flow Conductor Execution Return Codes
CRM
Data Warehouse
Legacy
1-7
BENEFITS
1. 2. 3. 4.
1-8
1 1
Extract
Transform
Load
E-LT
Transform Transform Extract Load
Benefits
Optimal Performance & Scalability Easier to Manage & Lower Cost
1-9
2 2
Evolve from Batch to Near Realtime Warehousing on Common Platform Unify the Silos of Data Integration Data Integrity on the Fly Services Plug into Oracle SOA Suite
Benefits Enables real-time data warehousing and operational data hubs Services plug into Oracle SOA Suite for comprehensive integration
1-10
3 3
Benefits
Significantly reduce the learning curve Shorter implementation times Streamline access to non-IT pros
1-11
4 4
Reverse
Staging Tables
Load CDC
Sources Journalize
Integrate Check
Error Tables Target Tables
Services
DB2 Journals
DB2 Exp/Imp
Oracle SQL*Loader
Check Sybase
Type II SCD
Benefits Tailor to existing best practices Ease administration work Reduce cost of ownership
1-12
1-13
4. 5. 6.
CRM
Data Warehouse
CRM
Data Warehouse
Legacy
Legacy
ERP
1-14
Extended Capabilities
1-15
Extended Capabilities
Master Data Management enabled
Common Format Designer Automated generation of canonical format and transformations Built-in Data Integrity
Real-time enabled
Changed Data Capture Message Oriented Integration (JMS)
SOA enabled
Generation of Data Services Generation of Transformation Services
Extensibility
Knowledge Modules Framework Scripting Languages Open Tools
1-16
Use Cases
1-17
Aggregate Export
Cube
Operational
Analytics
-------------
Data Warehouse
Cube
Cube
Metadata
1-18
SOA Initiative
Establish Messaging Architecture for Integration Incorporate Efficient Bulk Data Processing with ODI
Services
Business Processes
Transformation
Invoke external services for data integration Deploy data services Deploy transformation services Integrate data and transformation services in your SOA infrastructure
Operational
Others
Metadata
1-19
Master Data
CDC -------------
Metadata
1-20
10
Migration
Upgrade Applications or Migrate to New Schema Move Bulk Data Once and Keep in Sync with ODI
CDC
-------------
CDC
Bulk-load historical data to new application Transform source format to target Synchronize new and old applications during overlap time Capture changes in a bidirectional way (CDC)
Old Applications
Other Sources
New Application
Metadata
1-21
SAP/R3
PeopleSoft
Oracle EBS
1-22
11
1-23
High Performance Loading of BAMs Active Data Cache Pre-built and Integrated
CDC
PeopleSoft
Message Queues
1-24
12
OTN (external):
http://otn.oracle.com/goto/odi
Field support:
ORACLEDI-COMMUNITY_WW@oracle.com
Forum:
http://forums.oracle.com/forums/forum.jspa?forumID=374&start=0
KMs:
http://odi.fr.oracle.com
1-25
Lesson summary
Data Integration Data Integration Challenges Challenges Market Market Positioning of Positioning of ODI ODI
1-26
13
1-27
14
2
2-1
Objectives
After completing this lesson, you should:
Know the different components of the ODI architecture Understand the structure of the Repositories
2-2
Components
2-3
Graphical Modules
Designer Reverse-Engineer Develop Projects Release Scenarios Java - Any Platform Any ISO-92 RDBMS
Repository
2-4
Run-Time Components
Designer Reverse-Engineer Develop Projects Release Scenarios Java - Any Platform Operator Operate production Monitor sessions
Monitor sessions View Reports
Submit Jobs
Repository
Any ISO-92 RDBMS Scheduler Agent Handles schedules Orchestrate sessions Java - Any Platform
Return Code
Execute Jobs
Information System
2-5
Metadata Navigator
Any Web Browser Browse metadata lineage Operate production
Repository
Any ISO-92 RDBMS Scheduler Agent Handles schedules Orchestrate sessions Java - Any Platform Metadata Navigator Web access to the repository J2EE Application Server
Submit Executions
Return Code
Execute Jobs
Information System
2-6
SOA
Designer Generate and deploy Web Services
Repository
Any ISO-92 RDBMS Scheduler Agent Handles schedules Orchestrate sessions Java - Any Platform Tomcat / OC4J Web Services presentation J2EE Application Server
Return Code
Execute Jobs
Information System
2-7
Repository
Any ISO-92 RDBMS Scheduler Agent Handles schedules Orchestrate sessions Java - Any Platform Information System Repository Access HTTP Connection Execution Query Metadata Navigator Web access to the repository J2EE Application Server
2-8
ODI Repositories
2-9
Models Projects Execution Work Repository (Development) Execution Execution Repository (Production)
Two type of Repositories: Master and Work Work Repositories are always attached to a Master Repository
2 - 10
Master Repository
Models Projects Execution Work Repository (Development) Models Projects Execution Work Repository (Test & QA)
Lesson summary
2 - 12
2 - 13
Oracle Data Integrator First Project Simple Transformations: One source, one target
3
3-1
Objectives
After completing this lesson, you will know how to:
Create a first, basic interface Create a filter Select a Knowledge Module and set the options Understand the generated code in the Operator Interface
3-2
3-3
Toolbar
Selection Panel
Metadata
Project
3-4
Terminology
ETL/ELT projects are designed in the Designer tool Transformations in ODI are defined in objects called Interfaces. Interfaces are stored into Projects Interfaces are sequenced in a Package that will be ultimately compiled into a Scenario for production execution
3-5
Interface
An Interface will define
Where the data are sent to (the Target) Where the data are coming from (the Sources) How the data are transformed from the Source format to the target format (the Mappings) How the data are physically transferred from the sources to the target (the data Flow)
Source and target are defined using Metadata imported from the databases and other systems Mappings are expressed in SQL Flows are defined in Templates called Knowledge Modules (KMs)
3-6
Interfaces are created in Projects To create any object in ODI, right-click on the parent node and select Insert xyz This is true for interfaces as well: On the projects Interfaces entry, select Right-Click/Insert Interface.
3-7
3-8
Metadata
3-9
Automatic Mappings
Automatic Mapping creates mappings by matching column names automatically. ODI will prompt you before doing so: you have the option to disable this feature.
3-10
3-11
3-12
Note
y ce onl i nt er f a An t es a pop ul a r get ta single ore. dat as t ul at e T o p o p a r g e t s, lt se ve r a everal need s yo u ces. i nt er f a
3-13
Use the expression editor for the list of supported functions and operators MAX(), MIN(), etc. ODI automatically generates the GROUP BY clause.
Any combination of clauses is allowed: SRC_SALES_PERSON.FIRST_NAME || ' ' || UCASE(SRC_SALES_PERSON.LAST_NAME)
3-14
Filtering Data
Drag and drop a column on the background area Then type the filter expression
Check expression. SQL filter expression Execution location Expression editor Save expression
3-15
Click the Apply button to save the interface You can press the OK button to save and close the interface. The Cancel button closes the interface without saving it. Interfaces are saved in the Work Repository.
3-16
Note
y ce ma nt er f a An i han more t hav e ur ce. one so esso n, r t h i s l y u se Fo l o nl we wil ce. ur one so
3-17
3-18
3-19
KM and KM Options
Click on the caption to display Loading KM choices and options Click on the caption to display the Integration KM choices and options
3-20
10
Important Note
ant ! Import r e t ha t ake su M o pr i at e e ap p r th
m been i project ! e i nt o t h
3-21
Interfaces: Execution
3-22
11
Requirements
To run an interface, you need at least the following:
A target table An Integration Knowledge Module (selected in the Flow tab) A Loading Knowledge Module if there is a remote source.
If you have all the prerequisites, you are ready to execute the interface.
3-23
Running an Interface
Simply click the Execute button
3-24
12
3-25
Code Generation
When we ask ODI to Execute the transformations, ODI will generate the necessary code for the execution (usually SQL code) The code is stored in the repository The execution details are available in the Operator Interface:
Statistics about the jobs (duration, number of records processed, inserted, updated, deleted) Actual code that was generated and executed by the database Error codes and error messages returned by the databases if any
3-26
13
3-27
3-28
14
3-29
Errors Reporting
The red icon in the tree indicates the steps that failed Error Codes and Error Messages are reported at all levels
3-30
15
3-31
3-32
16
Course Summary
Create Interfaces Create Interfaces and define and define transformations transformations (mappings) (mappings) Understand Data Understand Data Flows, Select Flows, Select KMs and set KMs KMs and set KMs options options
Understand how Understand how to follow-up on to follow-up on the execution the execution
3-33
3-34
17
4
4-1
Objectives
After completing this lesson, you will:
Understand how to design an interface with multiple sources. Know how to define relations between the source using joins. Better understand an interfaces flow. Be able to customize the default flow of an interface. Be able to appropriately choose a Staging Area
4-2
4-3
Multiple Sources
You can add more than one source datastore to an interface. These datastores must be linked using joins. Two ways to create joins:
References in the models automatically become joins in the diagram. Joins must be manually defined in the diagram for isolated datastores.
4-4
Note
an Import
t!
1.
Drag and drop a column from one datastore onto a column in another datastore.
A join linking the two datastore appears in the diagram. In the join code box, an expression joining the two columns also appears.
2.
3. 4.
4-6
Setting up a Join
Joins can be defined across technologies (here a database table and a flat file) The number of joins per interface is not limited
Validate expression
Expression editor Save expression Join order (ISO-92 Syntax) Use ISO-92 syntax Automatically calculate order
4-7
Types of Joins
The following type of joins exist:
Cross Join Cartesian Product. Every combination of any Customer with any Order, without restriction. Only records where a customer and an order are linked. All the customers combined with any linked orders, or blanks if none. All the orders combined with any linked customer, or blanks if none. All customers and all orders.
Inner Join
4-8
4-9
Active Mapping
When unchecked, the filter, join or mapping is disabled for this interface
Choose the update key by selecting the Key checkbox Change the execution location of the filter, join or mapping.
4-10
Active Mapping When unchecked, the filter, join or mapping is disabled for this interface Enable mapping for update and/or insert Allows mappings to only apply to updates or inserts. By default, both insert and update are enabled Choose the update key by selecting the Key checkbox Change the execution location of the filter, join or mapping.
4-11
4-12
An update key: is a set of columns capable of uniquely identifying one row in the target datastore is used for performing updates and flow control can be:
one of the primary/unique keys defined for the datastore defined specially for the interface
4-13
To define a new key in the Interface only 1. Choose <Undefined> for the update key. 2. Select one target column to make part of the update key. 3. Check the Key checkbox in the properties panel. 4. Repeat for each column in the update key. To define a new key for the table that could be used in other interfaces 1. Go back in the Model 2. Expand the table 3. Right-click on Constraints and add a new key (more on this in a later chapter)
4-14
4-15
You may need to change the execution location if: The technology at the current location does not have the features required
Files, JMS, etc do not support transformations A required function is not available
4-16
n e wh e ke car Ta ing the tion cha ng l oca cut i on exe in g t h e or m ov ar ea t agi ng s . Y ou cation lo ou bl e oul d d sh t he che ck at i o n n sf o r m tra . sy nt ax
4-17
4-18
Flow The path taken by data from the sources to the target in an ODI interface. The flow determines where and how data will be extracted, transformed, then integrated into the target.
4-19
Note
and i ng nde r st U will e flow y th man av oi d at run bl e m s pr o time. g t hi s asterin ill help M pt w con ce pr ov e u t o im e . yo manc per f or
4-20
10
4-21
Source Sybase
ORDERS
Target Oracle
LINES
SALES
4-22
11
Target: Oracle
SALES
11
LINES Extract/Join/Transform
C$_0
55 33
I$_SALES
Join/Transform
CORRECTIONS File
22
Extract/Transform
C$_1
4-23
Staging Area A separate, dedicated area in an RDBMS where ODI creates its temporary objects and executes some of your transformation rules. By default, ODI sets the staging area on the target data server.
4-24
12
The Staging Area cannot be placed on non relational systems (Flat files, ESBs, etc.)
4-25
13
1. 2.
3.
4.
Go to the interfaces Definition tab of your Interface. To choose the Staging Area, check the Staging Area Different From Target option, then select the logical schema that will be used as the Staging Area. To leave the Staging area on the target, uncheck the Staging Area Different From Target option Go to the Flow tab. You can now see the new flow.
4-27
Staging Area
Transform & Integrate
11
LINES Extract/Join/Transform
C$_0
55 33
I$_SALES
SALES
Join/Transform
CORRECTIONS File
22
Extract/Transform
C$_1
4-28
14
Case #1 in ODI
Staging Area in the Target
Source Sets
4-29
Target (Oracle)
SALES
11
LINES Extract/Join/Transform
C$_0
55 33
I$_SALES
Join/Transform
CORRECTIONS File
22
Extract/Transform
C$_1
4-30
15
Case #2 in ODI
Staging Area is the Sunopsis Memory Engine
Target
Source Sets
Staging Area
4-31
Source (Sybase)
ORDERS
Staging Area 11
C$_0
Target (Oracle)
SALES
55 33
I$_SALES
LINES
Extract/Join/Transform Join/Transform
C$_1
22
CORRECTIONS File Extract/Transform
4-32
16
Case #3 in ODI
Staging Area in the Source
Target
Source Sets
Staging Area
4-33
4-34
17
When processing happens between two data servers, a data transfer KM is required.
Before integration (Source Staging Area) Requires an LKM, which is always multi-technology At integration (Staging Area Target) Requires a multi-technology IKM
When processing happens within a data server, it is entirely performed by the server.
A single-technology IKM is required. No data transfer is performed
4-35
Source
Staging area
Target
4-36
18
More on KMs
4-37
Case #1
Using the Target as the Staging Area
Target (Oracle) Source (Sybase)
ORDERS
Staging Area
LKM_1 LKM_1
LINES LKM SQL to Oracle
C$_0
SALES
CORRECTIONS File
LKM_2 LKM_2
LKM File to Oracle (SQLLDR)
C$_1
4-38
19
Case #2
Using a third server as the Staging Area
Sunopsis Memory Engine Source (Sybase) Staging Area
ORDERS IKM SQL to SQL Append
Target (Oracle)
SALES
IKM_1 IKM_1
C$_1
I$_SALES
CORRECTIONS File
LKM_2 LKM_2
LKM File to SQL
4-39
Case #3
Using the Source as the Staging Area
Source (Sybase)
ORDERS
Target (Oracle)
SALES
IKM_1 IKM_1
LINES
IKM_1 IKM_1
C$_1
LKM_1 LKM_1
CORRECTIONS File LKM File to SQL
4-40
20
1. 2.
Go to the interfaces Flow tab. Select the Source Set from which data will be extracted.
The KM property panel opens.
3. 4. 5.
Change the Name of the Source Set (optional) Select an LKM. Modify the LKMs Options.
4-41
4-42
21
3. 4. 5.
4-43
Common KM Options
The following options appear in most KMs:
INSERT UPDATE COMMIT
FLOW CONTROL STATIC CONTROL TRUNCATE DELETE ALL
DELETE TEMPORARY OBJECTS
Should temporary tables and views be deleted or kept for debugging purposes? 4-44
22
4-45
Lesson Summary
Using multiple, Using multiple, heterogeneous heterogeneous source source datastores datastores
4-46
23
4-47
24
5
5-1
Objectives
After completing this lesson, you should understand:
Have a generic understanding of the Metadata in ODI Be ready to do a more exploratory hands on tying together metadata and advanced transformations
5-2
Metadata in ODI
Metadata in ODI are available in the Model tab. Each Model will contain the tables from database schema. A model can contain all tables from a schema, or only a subset of the tables of the schema Models can contain sub models for an easier organization of the tables from a schema
5-3
To maintain the hierarchical view of the XML file, the driver will automatically create primary keys and foreign keys. To retain the order in which the records appear in the XML file, the driver will add an Order column.
5-4
Lesson summary
Introduction to Introduction to Models Models
5-5
5-6
6
6-1
Objectives
After completing this lesson, you will:
Know the different types of data quality business rules ODI manages. Be able to enforce data quality with ODI. Understand how to create constraints on datastores.
6-2
Data Quality should be managed in all three sub-systems ODI provides the solution for enforcing quality in all three.
6-3
6-4
Reference rules
Simple: column A = column B Complex: column A = function(column B, column C)
Validation rules
Mandatory Columns Conditions
6-5
Source
ORDERS Errors Integration Process LINES
Target
SALES
Errors
6-6
6-7
require a Check Knowledge Module (CKM) are monitored through Operator copy invalid rows into the Error table
Flow control then deletes them from flow. Static control leaves them in data stores. Error table can be viewed from Designer or any SQL tool.
6-8
Constraints in ODI
Mandatory Columns Keys
Primary Keys Alternate Keys Indexes
References
Simple: column A = column B Complex: column A = function(column B)
Conditions
6-9
Mandatory Columns
1. Double-click the column in the Models view. 2. Select the Control tab. 3. Check the Mandatory option. 4. Select when the constraint should be checked (Flow/Static).
6-10
Keys
1. 2. 3. 4. 5. 6.
Select the Constraints node under the datastore. Right-click, select Insert Key. Fill in the Name. Select the Key or Index Type Go to the Columns tab Add/remove columns from the key.
6-11
1. 2.
3. 4.
Go to the Control tab. Select whether the key is Defined in the Database, and is Active Select when the constraint must be checked (Flow/Static). Click the Check button to perform a synchronous check of the key.
6-12
Creating a Reference
1. 2. 3. 4. 5.
Select the Constraints node under the datastore Right-click, select Insert Reference Fill in the Name Select the reference type
User Reference Complex Reference Set the model and table to <undefined> to manually enter the catalog, schema and table name.
6-14
1. 2. 3. 4.
5.
Go to the Columns tab Click the Add button Select the column from the Foreign Key table. Select the corresponding column from the Primary Key table. Repeat for all column pairs in the reference.
6-15
1. 2. 3.
Go to the Expression tab Set the Alias for the Primary Key table. Code the Expression
Prefix with the tables aliases Use the Expression Editor.
6-16
1. 2. 3.
Go to the Control tab. Choose when the constraint should be checked (Flow/Static). Click the Check button to immediately check the reference.
Not possible for heterogeneous references.
6-17
Creating a Condition
1.
2. 3. 4. 5.
Right-click Constraints node, select Insert Condition Fill in the Name. Select ODI Condition type. Edit the condition clause
Use the Expression Editor
6-18
1. 2.
3.
Go to the Control tab Select when the constraint must be checked (Flow/Static). Click the Check button to perform a synchronous check of the condition.
6-19
6-20
10
4.
6-21
4.
Set the FLOW_CONTROL and/or STATIC_CONTROL IKM options to Yes. Set the RECYCLE_ERRORS to Yes, if you want to recycle errors from previous runs
6-22
11
6-23
6-24
12
Interface
Interface
Model
Interface
Interface
Model
Model
Interface
Possible
Never
Always
6-25
To see which records were rejected: 1. Select the target datastore in the Models view. 2. Right-click > Control > Errors 3. Review the erroneous rows.
6-26
13
Lesson summary
Enabling Quality Enabling Quality Control Control
6-27
6-28
14
7
7-1
Objectives
After completing this lesson, you should understand:
Why Metadata are important in ODI Where to find your database metadata in ODI How to import Metadata from your databases How to use ODI to generate your models
7-2
Why Metadata?
ODI is strongly based on the relational paradigm. In ODI, data are handled through tabular structures defined as datastores. Datastores are used for all type of real data structures: database tables, flat files, XML files, JMS messages, LDAP trees, The definition of these datastores (the metadata) will be used in the tool to design the data integration processes. Defining the datastores is the starting point of any data integration project
7-3
Models
7-4
Model Description
Models are the objects that will store the metadata in ODI. They contain a description of a relational data model. It is a group of datastores stored in a given schema on a given technology. A model typically contains metadata reverse-engineered from the real data model (Database, flat file, XML file, Cobol Copybook, LDAP structure) Database models can be designed in ODI. The appropriate DDLs can then be generated by ODI for all necessary environments (development, QA, production)
7-5
Terminology
All the components of relational models are described in the ODI metadata:
Relational Model Table; Column Not Null; Default value Primary keys; Alternate Keys Indexes; Unique Indexes Foreign Key Check constraint Description in ODI Datastore; Column
Not Null / Mandatory; Default value Primary keys; Alternate keys Not unique indexes; Alternate keys Reference Condition
7-6
Additional Metadata
Filters
Apply when data is loaded from a datastore.
Heterogeneous references
Link datastores from different models/technologies
7-7
7-8
Customized reverse-engineering
Read metadata from the application/database system repository, then writes these metadata in the ODI repository Uses a technology-specific strategy, implemented in a Reverseengineering Knowledge Module (RKM)
7-9
ODI Repository
Model (Metadata)
Delimited format
MS SQL Server
JDBC Driver
Standard Reverse-engineering
Data Model
System tables
Customized Reverse-engineering
7-10
7-11
Note
g ineerin se- eng . R eve r ent al ncr em is i at a i s m et ad N ew but ol d dd ed, t a a is n o et ad at m ed. r em ov
7-12
7-13
1. 2. 3. 4. 5. 6.
Go to the Models view. Select Insert Model. Fill in the Name (and Code). Select the model Technology. Select the Logical Schema where the model is found. Fill in the Description (optional).
7-14
Note
always del is A mo given d in a defi n e l ogy. t echno ge a u chan nology, I f yo t ec h odels m check ust related yo u m j ec t r e ob every o d e l. t hat m to
7-15
6.
7-16
1. 2. 3.
4. 5. 6.
Go to the Selective Reverse tab (Standard reverse only). Check the Selective Reverse option. Select New Datastores or/and Existing Datastores. Click Objects to Reverse. Select the datastores to reverseengineer. Click the Reverse button.
7-17
7-18
7-19
7-20
10
7-21
Lesson summary
ReverseReverseengineering engineering
Fleshing out models: Fleshing out models: why and how why and how
7-22
11
7-23
12
8
8-1
Objectives
After completing this course, you will:
Understand the basic concepts behind the Topology interface. Understand logical and physical architecture. Know how to plan a Topology. Have learnt current best practices for setting up a Topology.
8-2
What is the Topology? Topology The representation of the information system in ODI:
Technologies: Oracle, DB2, File, etc. Datatypes for the given technology Data Servers for each technologies Physical Schemas under each data server ODI Agents (run-time modules) Definition of Languages and Actions
8-3
8-4
8-5
Concepts in Reality
Technology
Oracle Microsoft SQL Server Sybase ASE DB2/400 Teradata Microsoft Access JMS Topic File
Data server
Instance Server Server Server Server Database Router File Server
Schema
Schema Database/Owner Database/Owner Library Schema (N/A) Topic Directory
8-6
Important Notes
ded mmen ly reco erver you rong s It is st h data rea for or eac hat f ated a s and t edic ad ject create rary ob hema. tempo Sc ODIs Work as the u s e it rver, ata se or each d al schema f Under sic a phy f the define ision o ed. ub-div each s at will be us th server
8-7
Example Infrastructure
Production site: Boston Windows
MS SQL Server
Linux
Oracle 9i
ACCOUNTING
db_dwh db_purchase
Oracle 10g
SALES
Windows
MS SQL Server B
Linux
Oracle
db_dwh db_purchase
ACCT SAL
8-8
Oracle-Boston9
ACCOUNTING
Oracle-Boston10
SALES
Legend
Data server
Physical schema
MSSQL-TokyoA
dwh
MSSQL-TokyoB
Oracle-Tokyo
ACCT
purchase
SAL
8-9
8-10
Important Note
e is er nam all T he u s c e ss d t o a c ch e m a s, us e i ng s s nderly ib r a r i e u es o r l s databa ta server. da i n t he i s u se r sure th M a ke n t has le ges. acc ou privi ficient su f
8-11
Right-click the technology of your data server Select Insert Data Server Fill in the Name Fill in the connection settings:
Data Server User and Password
8-12
1. 2. 3. 4. 5.
Select the JDBC tab Fill in the JDBC driver Fill in the JDBC URL Test the connection Click OK
8-13
Use the select button to choose the driver class name and URL template.
8-14
Click the Test button Select the Agent to test this Connection
Local (No Agent) performs the test with the Topology Manager GUI.
3.
8-15
8-16
Right-click the data server and select Insert Physical Schema Select or fill in:
Data Schema Work Schema
3. 4.
8-17
8-18
What is a Logical Schema? Developers should not have to worry about the actual location of the data servers, or the updates in user names, IP addresses, passwords, etc. To isolate them from the actual physical layer, the administration will create a Logical Schema that is simply an alias for the physical layer.
8-19
8-20
10
8-21
Logical Architecture
But changing the connectivity from one server to the other can become painful
Physical Architecture
Windows
MS SQL Server db_dwh
Windows
MS SQL Server A dwh
Windows
MS SQL Server db_dwh db_purchase
8-22
11
For that purpose, the definition of Contexts will allow you to attach more than one physical definition to a Logical Schema
n tio uc od Pr
Physical Architecture
Windows
MS SQL Server db_dwh
De
Windows
MS SQL Server A dwh
Windows
MS SQL Server db_dwh db_purchase
8-23
Datawarehouse
(Logical Schema)
Purchase
(Logical Schema)
Logical Architecture
Production
Production
Production
Unix
Windows
MS SQL Server db_dwh db_purchase
MS SQL Server
CRM
8-24
12
Notes
main may re ources ysical l res ny ph . Logica ntexts ed to a nmapp in a given co u ce t resour canno ource . ed res p xt Unmap in the conte used rce be l resou l hysica ra le p in seve A sing apped em may b ts. al contex a logic t ntext, ven co pped at mos In a gi ma ce is urce. resour al reso physic to one
8-26
13
Technology
The same technologies are displayed in Physical and Logical Architecture views. You can reduce the number of technologies displayed
Windows > Hide Unused Technologies
8-27
1. 2. 3. 4. 5. 6.
Double-click the context Go to the Agents tab For each logical agent, select the corresponding physical agent in the context. Go to the Schemas tab For each logical schema, select the corresponding physical schema in the context. Click OK.
8-28
14
8-29
2. 3.
4.
8-30
15
Contexts Development
Sales
SALES in Oracle on Windows
Tokyo
8-31
8-32
16
1. 2.
3. 4.
8-33
JDBC Driver
A JDBC driver is a Java driver that provides access to a type of database.
Type 4: Direct access via TCP/IP Type 3: Three- tier architecture Type 2: Requires the database client layer Type 1: Generic driver to connect ODBC data sources.
8-34
17
Technology
Oracle Microsoft SQL Server Sybase (ASE, ASA, IQ) DB2/UDB (type 2) DB2/400 Teradata Microsoft Access (type 1) File (Sunopsis driver)
Driver
oracle.jdbc.driver.OracleDriver com.inet.tds.TdsDriver com.sybase.jdbc2.jdbc.SybDriver COM.ibm.db2.jdbc.app.DB2Driver
com.ibm.as400.access.AS400JDBCDriver
URL
jdbc:oracle:thin:@<host>:<port>:<sid> jdbc:inetdae7:<host>:<port> jdbc:sybase:Tds:<host>:<port>/[<db>] jdbc:db2:<database> jdbc:as400://<host>[;libraries=<library>] jdbc:teradata://<host>:<port>/<server> jdbc:odbc:<odbc_dsn_alias> jdbc:snps:dbfile
8-35
Lesson summary
Defining your Defining your topology topology
Data servers & Data servers & physical schemas physical schemas
8-36
18
8-37
19
9
9-1
Objectives
After completing this lesson, you will:
Understand the structure and behavior of Knowledge Modules Be able to modify Knowledge Modules and create your own behavior
9-2
Definition
Knowledge Modules are templates of code that define integration patterns and their implementation They are usually written to follow Data Integration best practices, but can be adapted and modified for project specific requirements Example:
When loading data from a heterogeneous environment, first create a staging table, then load the data in the staging table. To load the data, use SQL loader. SQL loader needs a CTL file, create the CTL file for SQL loader. When finished with the integration, remove the CTL file and the staging table
9-3
When processing happens between two data servers, a data transfer KM is required.
Before integration (Source Staging Area) Requires an LKM, which is always multi-technology At integration (Staging Area Target) Requires a multi-technology IKM
When processing happens within a data server, it is entirely performed by the server.
A single-technology IKM is required. No data transfer is performed
9-4
More on KMs
9-5
Case #1
Using the Target as the Staging Area
Target (Oracle) Source (Sybase)
ORDERS
Staging Area
LKM_1 LKM_1
LINES LKM SQL to Oracle
C$_0
SALES
CORRECTIONS File
LKM_2 LKM_2
LKM File to Oracle (SQLLDR)
C$_1
9-6
Case #2
Using a third server as the Staging Area
Sunopsis Memory Engine Source (Sybase) Staging Area
ORDERS IKM SQL to SQL Append
Target (Oracle)
SALES
IKM_1 IKM_1
C$_1
I$_SALES
CORRECTIONS File
LKM_2 LKM_2
LKM File to SQL
9-7
Case #3
Using the Source as the Staging Area
Source (Sybase)
ORDERS
Target (Oracle)
SALES
IKM_1 IKM_1
LINES
IKM_1 IKM_1
C$_1
LKM_1 LKM_1
CORRECTIONS File LKM File to SQL
9-8
KM Types
There are five different types of knowledge modules:
KM Type Interfaces
Description
Assembles data from source datastores to the staging area. Uses a given strategy to populate the target datastore from the staging area. Checks data in a datastore or during an integration process. Retrieves the structure of a data model from a database. Only needed for customized reverse-engineering. Sets up a system for Changed Data Capture to reduce the amount of data that needs to be processed. Defines the code that will be generated to create Data Web Services (Exposing data as a web service)
Models
9-9
Source
Staging area
Target
9-10
5. Click OK
9-11
Description
A Knowledge Module is made of steps. Each step has a name and a template for the code to be generated. These steps are listed in the Details tab. The code that will be generated by ODI will list the same step names
9-12
9-13
Options
KMs have options that will
Allow users to turn options on or off Let users specify or modify values used by the KM
Options are defined in the projects tree, under the KM Options are used in the KM code with the substitution method <%=odiRef.getOption(OptionName)%> On/Off options are defined in the Options tab of each step of the KM
9-14
getColList
Returns a list of columns and expressions. The result will depend on the current phase (Loading, integration, control).
getTargetTable
Returns general information on the current target column.
getTable
Returns the full name of the temporary or permanent tables handled by ODI.
getObjectName
Returns the full name of a physical object, including its catalog and schema.
9-15
getInfo Method
Syntax in a KM or Procedure
<%=snpRef.getInfo("pPropertyName")%>
9-16
getColList Method
Values returned according to the phase:
Loading (in a KLM) To build loading tables To feed loading tables Integration (in a KIM) To build the integration table To feed the integration table Control (KCM) To build the integration table and feed it To control the constraints
9-17
getColList Method
Syntax
<%=snpRef.getColList("pStart","pPattern","pSeparator", "pEnd","pSe lector")%>
Where pStart is the string to insert before the pattern pPattern is the string used to identified the returned values Ex: [COL_NAME] returns a list of column names Several pPattern can be declared pSeparator is the character to insert between the returned patterns pEnd the string to insert at the end of the list pSelector is the string that defines a Boolean expression used to filter the elements of the initial list 9-18
getColList Examples
Retrieve a columns list and their data types (Loading phase):
<%=snpRef.getColList("(", "[COL_NAME] [SOURCE_CRE_DT] null", ",\n", ")", "")%> Returns for instance :
Retrieve the list of columns of the target to create the loading tables:
<%=snpRef.getColList("", "[CX_COL_NAME]\t[DEST_CRE_DT] " + snpRef.getInfo("DEST_DDL_NULL"), ",\n", "","")%>
9-19
getColList Examples
Retrieve the list of columns to be updated in the target (integration phase):
<%=snpRef.getColList("(", "[COL_NAME]", ",\n", ")", "INS OR UPD")%>
9-20
10
Modifying a KM
Very few KMs are ever created. They usually are extensions of modifications of existing KMs. To speed up development, duplicate existing steps and modify them. This will prevent typos in the syntax of the odiRef methods. If you modify a KM that is being used, all interfaces using that KM will inherit the new behavior. Remember to make a copy of the KM if you do not want to alter existing interfaces. Then modify the copy, not the original. Modifying a KM that is already used is a very efficient way to implement modifications in the data flow and affect all existing developments.
9-21
Lesson summary
9-22
11
9-23
12
10
10-1
Objectives
After completing this lesson, you will:
Understand why CDC can be needed Understand the CDC infrastructure in ODI What types of CDC implementations are possible with ODI How to setup CDC
10-2
Introduction
The purpose of Changed Data Capture is to allow applications to process changed data only Loads will only process changes since the last load The volume of data to be processed is dramatically reduced CDC is extremely useful for near real time implementations, synchronization, Master Data Management
10-3
10-4
CDC in ODI
CDC in ODI is implemented through a family of KMs: the Journalization KMs These KMs are chosen and set in the model Once the journals are in place, the developer can choose from the interface whether he will use the full data set or only the changed data
10-5
A series of views is created to join this table with the actual data When other KMs will need to select data, they will know to use the views instead of the tables
10-6
10-7
10-8
Consistent CDC
The mechanisms put in place by Consistent CDC will solve the issues faced with simple CDC The difference here will be to lock children records before processing the parent records As new parent records and children records come in, both parent and children records are ignored
10-9
Note: all these steps can either be implemented in the Knowledge Modules or done separately, as part of the Workflow management.
10-10
Using CDC
Set a JKM in your model For all the following steps, right-click on a table to process just that table, or right-click on the model to process all tables of the model:
Add the table to the CDC infrastructure: Right-click on a table and select Changed Data Capture / Add to CDC For consistent CDC, arrange the datastores in the appropriate order (parent/child relationship): in the model definition, select the Journalized tables tab and click the Reorganize button Add the subscriber (The default subscriber is SUNOPSIS) Rightclick on a table and select Changed Data Capture / Add subscribers Start the journals: Right-click on a table and select Changed Data Capture / Start Journal
10-11
10-12
10-13
Lesson summary
Implement Implement CDC CDC
10-14
10-15
11
11-1
Objectives
In this lesson, you will:
Learn how ODI Packages are used to create a complete workflow. See how to create several different kinds of package steps. Learn how to execute a package.
11-2
What Is a Package?
Package: An organized sequence of steps that makes up a workflow. Each step performs a small task, and they are combined together to make the package.
11-3
1. 2.
Create and name a blank package Create the steps that make up the package
Drag interfaces from the Projects view onto the Diagram tab Insert ODI tools from the toolbox Define the first step Define the success path Set up error handling
3.
11-4
Diagram
11-5
Select Next step on success Next step on failure Duplicate selection Delete selection Rearrange selection
11-6
11-7
11-8
11-9
ODI tools Macros that provide useful functions to handle files, send emails, use web services, etc. Tools can be used as steps in packages.
11-10
2. 3.
4. 5. 6.
Change the Step Name in the Properties panel. Set the tools Properties. Click Apply to save.
11-11
A Simple Package
First step Step on success
Step on failure
This package executes two interfaces then archives some files. If one of the three steps fails, an email is sent to the administrator.
11-14
Executing a Package
1. Click the Execute button in the package window
2.
Open Operator
The package is executed as a session Each package step is a step Tool steps appear with a single task Interface steps show each command as a separate task
11-16
11-17
Lesson Summary
Executing a Executing a package and package and viewing the log viewing the log
Sequencing Sequencing steps with error steps with error handling handling
11-18
11-19
10
12
12-1
Objectives
After completing this lesson, you will:
Understand Metadata Navigator Know How to use Metadata Navigator Be able to explain the features in Metadata Navigator
12-2
Purpose
Medatadata Navigator will give access to the metadata repository from a Web interface It is a read-only interface (see Lightweight Designer for an interactive interface) It can build graphical flow maps and data lineage based on the metadata
12-3
12-4
Overview
By default, Metadata Navigator will show the projects available in the repository. Users menu will be customized based on their privileges
12-5
Repository Objects
All objects in the repositories can be viewed. Hyperlinks let you jump from one object to the other.
12-6
Data Lineage
For data lineage, MN will list the source datastores and target datastores for any element. You can click on any icon in the graph to get further lineage
12-7
12-8
12-9
Flow Maps
Flow maps will show the dependencies between models (or datastores) and projects (or interfaces) You can choose the level of details that you want
12-10
12-11
Execution of a Scenario
Select the values from the drop-down menus Set the value for the parameters Execute!
12-12
Lesson summary
How to Use How to Use Metadata Navigator Metadata Navigator
12-13
12-14
13
13-1
Objectives
After completing this lesson, you will:
Understand why Web Services? Understand the different types of Web Services Know how to setup these web services
13-2
Environment
In this presentation, Apache Tomcat 5.5 or Oracle Container for Java (OC4J) are used as the application server, with Apache Axis2 as the Web Services container. Examples may need to be adapted if using other Web Services containers.
13-3
13-4
13-5
13-6
13-7
Context.xml
Add the following entry in the file
Resource name will be re-used in the web.xml file and in the Model in Designer driverClassName, url, username and password will explicitely point to the data source <Context > <Resource name="jdbc/Oracle/Win" type="javax.sql.DataSource" driverClassName="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@SRV1:1521:ORA10" username=mydatabaseuser" password=mydatabasepassword" maxIdle="2" maxWait="-1" maxActive="4"/> </Context>
13-8
OC4J
Update the file <oc4j_home>\j2ee\home\config\data-sources.xml with the appropriate connection information:
<!-- The following is an example of a data source whose connection factory emulates XA behavior. --> <managed-data-source name="OracleDS" connection-pool-name="Example Connection Pool" jndi-name="jdbc/OracleDS"/> <connection-pool name="Example Connection Pool"> <connection-factory factoryclass="oracle.jdbc.pool.OracleDataSource" user="system" password="system" url="jdbc:oracle:thin:@//localhost:1521/XE"> </connection-factory> </connection-pool>
13-9
Tomcat
Add the following entry in the context.xml file
Resource name will be re-used in the web.xml file and in the Model in Designer driverClassName, url, username and password will explicitely point to the data source
Update the web.xml file with the resource name of the context.xml file (here res-ref-name)
<resource-ref> <description>Data Integrator Data Services on Oracle_SRV1</description> <res-ref-name>jdbc/Oracle/Win</res-refname> <res-type>javax.sql.DataSource</res-type> <res-auth>Container</res-auth> </resource-ref>
<Context > <Resource name="jdbc/Oracle/Win" type="javax.sql.DataSource" driverClassName="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@SRV1:1521:ORA10" username=mydatabaseuser" password=mydatabasepassword" maxIdle="2" maxWait="-1" maxActive="4"/> </Context>
13-10
13-11
13-12
13-13
Select a method, enter the mandatory parameters, click ( ) to test the web service. The results of the call will be displayed in a grid
13-14
To modify the behavior of the web services, you can edit the SKM like any other Knowledge Module
13-15
Lesson summary
Different Types of Different Types of Web Services Web Services Setup Data Setup Data Web Services Web Services
13-16
13-17
14
14-1
Objectives
After completing this lesson, you will know how to use:
Variables Sequences User Functions Advanced Mappings
14-2
Variables
14-3
What Is a Variable?
Variable An ODI object which stores a typed value, such as a number, string or date. Variables are used to customize transformations as well as to implement control structures, such as if-then statements and loops, into packages.
14-4
Variable Scope
Scope defines where the variable can be used.
Global variables are accessible everywhere (defined in the Others tab of the Designer tree). Project variables are accessible within the project (and defined under the project in the Designer tree).
14-5
14-6
Defining Variables
14-7
Description of variable
14-8
14-9
Using Variables
14-10
Tip: Use the Expression Editor to avoid mistakes in variables names. Variables are used either by string substitution or by parameter binding. Substitution: #<project_code>.<variable_name> Binding: :<project_code>.<variable_name>
14-11
2. 3.
4.
14-13
Variable Steps
Declare Variable step type:
Forces a variable to be taken into account. Use this step for variables used in transformations, or in the topology.
14-14
14-15
14-16
Note
es are que nc ODI se st as DBMS fa not as es . nc se qu e nc es sequ e DBMS e used e. oul d b sh s sib l ver po w h er e
14-18
User Functions
14-19
User Function A cross-technology macro defined in a lightweight syntax used to create an alias for a recurrent piece of code or encapsulate a customized transformation.
14-20
10
Simple Example #1
A simple formula:
If <param1> is null then <param2> else <param1> end if
Other technologies:
case when <param1> is null then <param2> else <param1> end
14-21
Simple Example #2
A commonly used formula:
If <param1> = 1 then Mr else if <param1> = 2 then Ms else if <param1> = 3 then Mrs else if <param2> = 77 then Dr else if <Param2> = 78 then Prof else end if
14-22
11
14-23
12
Select the Functions node in a project, or in the Others view Right-click > Insert Function Fill in the:
Name Syntax Description
4.
14-25
1. 2. 3. 4.
Select the Implementations tab Click the Add button. Enter the code for the implementation Select the applicable technologies.
14-26
13
Argument names in the syntax and implementation should match exactly. Examples:
Syntax:
NullValue($(myvariable), $(mydefault))
Implementation:
case when $(myvariable) is null then $(mydefault) else $(myvariable) end
14-27
At design-time, you can refer to user functions like any regular database function
You can use them in mappings, joins, filters, procedures, etc. They are available in the expression editor.
14-28
14
Advanced Mappings
14-29
14-30
15
14-31
Lesson summary
Defining and Defining and using sequences using sequences
Defining and Defining and using User using User Functions Functions
14-32
16
14-33
17
15
15-1
Objectives
After completing this lesson, you will know how to:
Create simple reusable procedures. Add commands. Provide options on your commands. Run your procedures. Use the procedure into a package
15-2
What is a procedure?
Procedure A sequence of commands executed by database engines, the operating system, or using ODI Tools. A procedure can define options that control its behavior. Procedures are reusable components that can be inserted into packages.
15-3
Procedure Examples
Email Administrator procedure
1. Uses the SnpsSendMail ODI tool to send an administrative email to a user. The email address is an option. Deletes the contents of the /temp directory using the SnpsFileDelete tool. Runs DELETE statements on these tables in order: CUSTOMER, CITY, REGION, COUNTRY.
15-4
15-5
1. 2. 3.
Right-click the Procedures node under a project. Select Insert Procedure Fill in the
Name Description
4.
15-6
6. 7.
15-7
15-8
15-9
15-10
More Elements
In addition, we have access to the following ODI-specific elements that can be used within the commands:
Variables Sequences User Functions They may be specified either in substitution mode #<variable>, or in bind mode :<variable> They may be specified either in substitution mode #<sequence>, or in bind mode :<sequence> Used like DBMS functions. They are replaced at code generation time by their implementation.
15-11
15-12
Types of Options
Options are typed procedure parameters
Checkbox, value or text
Options have a default value. The value can also be specified when the procedure is used.
15-13
1. 2. 3.
Right-click the name of the procedure. Select Insert Option. Fill in the
Name Description Help
4.
Select a
Default Value Type Position to display in
15-14
1. 2. 3.
Open a command, then select the Options tab. Check the option box if it should trigger the command execution. If the command should run, independently of any options, check Always Execute option.
15-15
The value of the option is specified when the procedure is used within a package. Otherwise, the default value is used.
15-16
Procedure Execution
A procedure can be executed manually for testing.
Default option values are used.
15-17
Procedure is executed as a session with one step One task for each command
Warning: error was ignored Tasks completed successfully Error that was not ignored Task waiting to be run.
15-18
15-20
10
15-21
1. 2. 3.
4.
Under a project, select the procedure which you want to work with. Drag and drop it into the package. Set the step Name. There is only one type of procedure step. Override any options on the Options tab.
15-22
11
1.
2. 3. 4.
Select the model, sub-model or datastore which you want to work with in the Models view. Drag and drop it into the package. Select the type of operation to perform. Set the Options for the chosen operation
15-23
Reverse-Engineering type:
The reverse method defined for the model is used. If using customized reverseengineering, you must set the RKM options on the Options tab.
Reverse-engineer Journalize
Check type:
The static control strategy (CKM) set for the Check model and the datastores are used. Options for the CKM are set on the Options tab. Select Delete Errors from the Checked Tables to remove erroneous records.
15-24
12
15-25
15-26
13
Controlling Execution
15-27
Error Handling
Interfaces fail if a fatal error occurs or if the number of allowed errors is exceeded. Procedures and other steps fail if a fatal error occurs Try to take into account possible errors.
15-28
14
3.
Package Loop
1.
2. 4.
1. 2. 3. 4.
Set the counter to an initial value Execute the step or steps to be repeated Increment the counter Evaluate counter value and loop to step 2 if the goal has been reached
15-29
Each steps Advanced tab allows you to specify the how the next step is determined.
You can specify a number of automatic retries on failure.
Where to go next if the step completes successfully How many times this step should be re-attempted if it fails Where to go next if this step fails
List of all possible package steps. Choose one to be executed next if this one succeeds. Time interval (in seconds) between each attempt List of all possible package steps. Choose the one to be executed next if this one fails.
15-30
15
Lesson summary
Creating Creating procedures with procedures with options options
Complex Complex workflows with workflows with branching and branching and loops loops
15-31
15-32
16
16
16-1
Objectives
After completing this course, you will know:
How to create the Master Repository. How to create the Work repository. Know how to connect to the repositories.
16-2
Installation Process
Once you have downloaded ODI, the installation will require the following steps:
Install or Unzip ODI on your computer. Please note that if you simply unzip ODI, it requires a Java Virtual Machine version 1.5 or above Create 2 databases or schemas to host your repositories Use the provided Wizard to create the Master Repository Connect to the Master Repository with the Topology interface Create the Work repository from the Topology interface Connect to the Work Repository with the Designer interface
16-3
16-4
The users used to connect must have the following privileges: Connect Resource The users default database must be the ones that you create for the repositories
16-5
16-6
Tip!
e itor th ext ed nat Save i nnection co river, JDBC ters (D e param . r U RL) te thei le crea eop d Work Most p ster an e same Ma th initial ries in ll have posito Re ou wi ase. Y same datab r those imes. to ente everal t s values
16-7
16-8
16-9
16-10
16-11
16-12
Lesson summary
Create the Create the Master Master Repository Repository Create the Work Create the Work Repository Repository
Connect to the Connect to the repositories with repositories with the GUI the GUI
16-13
16-14
17
8-1
Understanding Agents
8-2
8-3
8-4
8-5
8-6
Agent Location
The agent has to be in a central location so that it can access
All databases (source and target) All database utilities (to load/unload for large volumes of data) The ODI Repository (Master and Work Repository)
8-7
Agent Installation
Agents can be installed with the graphical setup Agents can be installed manually: simply copy over the \bin, \drivers and \lib directories Once installed, an agent can be set as a service on a Windows machine, run the script agentservice.bat.
Installation as a listener only: Agentservice i a AgentName AgentPort Installation as a listener and scheduler: Agentservice i s AgentName AgentPort AgentName is only mandatory for scheduler agents or agents used for load balancing AgentPort is only mandatory if the port is different from the default (20910)
8-8
Agent Configuration
For scheduler agents only, you will have to update the file odiparams.bat (or .sh) in the bin directory
Update the parameters to connect to the Master repository Encrypt passwords with the following command (from a DOS or Unix prompt: Agent encode MyPassword (Replace MyPassword with your password)
8-9
8-10
8-11
4. 5.
Click the Test button if the agent is already running Optional load balancing:
Set the Maximum number of sessions. Define the linked agents on the Load Balancing tab.
8-12
Important Note
st the ays te heck A lw on to c cti conn e ent is the ag that tly correc ed. ur config
8-13
Important Note
i cal e phys gent On per a agent d. starte
8-14
8-15
Lesson summary
8-16
8-17
18
Tutorial On OTN
PM has put together a tutorial that will take you through the profiling operations and the different features of the product. The tutorial comes with sample data to be profiled and cleansed. We recommend that you do the tutorial to familiarize yourselves with the product.
Requirements
Connections for Profiling are defined in the Metabase and are made of:
A loader connection that defines where the resources are located Several entities that are loaded in the metabase so that the profiling operations can be performed Entities are flat files or ODBC connections Not all records have to be loaded. All rows First x rows Random x% of the rows Skip the first x rows Dynamic (data not loaded in the metabase)
Profiling Project
The first step to profile data will be to create a profiling project in which you include the entities to be profiled. This will allow you to analyze dependencies between the different entities of the project
Attributes will contain the result of the analysis of the different fields that have been profiled
Compliance with pre-defined rules Unicity level for the column Patterns (Phone numbers, SSN, etc. ) Min/Max values, Min/Max length, etc.
Metadata
Double-click on any element to drill down in the details and ultimately view the records
Attributes
Double-click on any element to drill down in the details and ultimately view the records
Create key (or dependency) and double click on it to see duplicate (or orphan) records
Key Analysis
DSD Examples
19
Tutorial On OTN
PM has put together a tutorial that will take you through the profiling operations and the different features of the product. The tutorial comes with sample data to be profiled and cleansed. We recommend that you do the tutorial to familiarize yourselves with the product.
ODP will automatically generate the required steps. You will customize these steps to define your DQ project
Auto-Generated Project
Processes are represented by the arrows: double-click on any arrow to specify the options for each step Books represent intermediate entities (data) in the cleansing process
Cleansing steps
Transformer: non country specific transformations filtering of the data Global Router: route country specific data to a country specific cleansing process (rules will be country specific) Country Specific Transformer: at this level, you will specify which fields will be cleansed Customer Data Parser: Identify and parse names and addresses (Country Specific) Sort for Postal Matcher: improves performance for the Postal Matcher (next step) Postal Matcher: enhances data with dictionaries from the Postal Office Window Key Generator: prepares records to identify duplicates Relationship Linker: matches duplicates Commonizer: enriches duplicates with values from similar records selects the best surviving record Data Reconstructor: Constructs data for the output, cleansed data result
20
20-1
Objectives
After completing this session, you should:
Understand what ODI offers by way of Versioning Understand Use of Solutions Understand how to move objects between repositories
20-2
Versions in ODI
Objects in ODI can be Versioned
Projects, Folders, Packages, Interfaces, Procedures, Sequences, Variables, User Functions Model Folder, Models
Creating a version creates the XML definition of the object, and then stores it in compressed form in the Master Repository version tables Versions can then be restored into any connected work repository ODI Provides a Version Browser
20-3
Creating a version
Right-Mouse-Button on Object, Version/Create Does NOT version dependant objects, but DOES include them in this version Automatic allocation of Version number (over-ride able) Prompts for Version Description
20-4
20-5
Object Type Selector Specific Object Selector Restore Version Export Object to XML Refresh View Delete Version(s)
20-6
20-7
Version Browser
Current version is stored at object level You may restore older versions, which will replace whole object tree (may delete some objects which did not exist and if not versioned, you will not be able to get them back)
20-8
20-9
Exporting Objects
Object XML may be exported to XML file Can be used for storage in external Source-code-control systems Exports can also be imported into other repositories
20-10
20-11
20-12
ODI Solutions
A solution is a comprehensive and consistent set of interdependent versions of objects. Like other objects, it can be versioned, and may be restored at a later date. Solutions are saved into the master repository. A solution assembles versions of the solution's elements.
20-13
Solution
Made up of Principal Elements Principal Elements imply the required elements, which will be automatically linked Pressing the Synchronize button automatically brings the solution up to date
adds required elements removes unused elements
20-14
20-15
20-16
20-17
Lesson summary
20-18
20-19
10