You are on page 1of 165

DataStage

:: FUNDAMENTAL CONCEPTS:: DAY 1 Introduction for Phases of DataSta e Four different phases are in DataStage, they are Phase I: Data Profi!in It is for source system analyses, and the analysis are 1. Column analysis, 2. Primary key analysis,
3. Foreign key analysis, by this analysis hether e can find the data is !dirt"" or !not".
2010

#. $ase %ine analysis, and &. Cross domain analysis. Phase II: Data #ua!it" 'or also called as cleansing( In this process e must follo inter dependent i.e., after one after one process as sho n belo . Parsing Correcting Standardi)ing *atching Consolidated Phase III: Data Trans$ission In this +,% process is done here, the data transmission from one stage to another stage -nd +,% means +. +/tract ,. ,ransmission %. %oad. Phase I%: Meta Data Mana e$ent . !*eta data means here the data for data". Inter Dependent

Navs notes

Page 1

DataStage
DAY & 'o( the ETL )ro ra$$in too! (or*s+ Pictorial 0ie 1
2010
Data $ase +,% Process $usiness Interface Flat files

+,%

db

$I

DM

D23
*S +/cel

Figure1 +,% programming process

Navs notes

Page 2

DataStage
DAY , Continue2010
+/tracting from .t/t '-SCII code(

Source
+/tract indo

Staging 'permanent data(

4nderstand to DataStage Format '5ati0e Format(

Source

Staging 'after transmission(

%oad indo

Source

D23
data base or resides in local repository

%oading the data into .t/t '-SCII code(

+,% is a process that is performs in stages1 S 6%,P , S , sa S sa , sa D23

stage area

3ere, S. source and ,. target. 3ome 2ork '32(1 one record for each kindle 'multiple records for multiple addresses and dummy records for 7oint accounts(8

Navs notes

Page 3

DataStage
DAY . ETL De/e!o)er 0e1uire$ents

91 6ne record for each kindle'multiple records for multiple addresses and dummy records for 7oint accounts(8 :indle means information of customers. Customer %oan $ank Credit card Sa0ings kindle

Customer maintaining one record but handling different addresses is called ;sin !e /ie( custo$er< or ;sin !e /ersion of truth<.

32 e/planation1 3ere e must read the =uery 0ery care fully and understand the terminology of the ords in business percepti0e. *ultiple records means multiple of the customers'records( and multiple addresses means one customer'one account( maintaining multiple of addresses like sa0ings>credit cards>current account>loan. ETL De/e!o)er 0e1uire$ents: 3%D %%D ,, ,, ,,

Inputs here, 3%D. high le0el document De0eloper %%D. lo le0el document

Navs notes

Page 4

2010

DataStage
ETL De/e!o)er 0e1uire$ents are: 1. Under Standing forums>team leads>pro7ect leads.
3. Logical designs1 means paper 4. Physical model1 using ,ool.
2010

2. Prepare Questions1 after reading document

hich is gi0en and ask to friends>

ork.

5. UNIT Test 6. Performance Tuning


7. Peer e!ie"s1 it is nothing but releasing 0ersions'0ersion control ?.??(

here, ? means range of 1.@. #. $esign Turn %!er $ocument &$T$'( $etailed $esign $ocument&$$$'( Technical $esign $ocument&T$$'
9. )ac*ups1 means importing and e/porting the data re=uire. 10. +o, Se-uencing

Navs notes

Page 5

DataStage
DAY 2 'o( the D3' )ro4ect is under ta*en+ 3%D Ae=uirements1 / 2arehouse'23( .3%D / /
as de0eloper in0ol0es
2010

Process:

,D De0eloper system engineer

7obs in B De0eloper 'CDB . EDB( Production'1DB( *igration 'FDB(

/ ,+S,

Production /

*igration /

here, / G cross mark that de0eloper not in0ol0es in the flo .


G mean

here the de0eloper in0ol0es in the pro7ect and implement a!! TEN

re1uire$ents sho n abo0e. Production based companies are like I$* and so on. *igration means Support based companies like ,CS, Cogni)ent, Satyam *ahindra and so on. In *igration1 orks both ser0er and parallel 7obs. Ser0er 7obs G parallel 7obs 4p to 2DD2 this en0ironment orked In this it con0erts up to, CDB automatically FDB manually. after 2DD2 and up to till this en0ironment

I$* launched H.*igrator, hich con0ert ser0er 7obs to parallel 7obs

Navs notes

Page 6

DataStage
Pro7ect di0ided into some category ith respecti0e to period as sho n belo and its period' time of the pro7ect(. Categories Simple *edium Comple/ ,oo comple/ 2515 Pro4ect Process: . Period 'that taken in months and years( Im Im G 1y 1G 11>2 y 11>2 y G &y and so on'it may takes many years depend up on pro7ect(
2010

'high le0el documents( 3%D Ae=uirements1 SAS $AD 'here, business analy)er> Sub7ect matter e/pert(

3%D 2arehouse1

-rchitecture Schema 'structure( Dimensions and tables 'target tables( Facts

'lo le0el doc<s( %%D ,D *apping Doc<s 'specifications.spec<s( ,est Spec<s 5aming Doc<s

Navs notes

Page 7

DataStage
25&5 Ma))in Docu$ent: For e/ample if a =uery re=uirements are 1.e/perience employee, 2. dname, and F. first
2010

name, middle name, last name. For this mapping pictorial ay as e see in the ay1

Common fields

S.no

%oad order

,arget +ntity

,arget

Source

Source Fields
3ire date Dno

,ransmi ssion
Current Date. 3ire date 'CD.3D(

Constan t Pk Fk Sk

+rror 3andling F C D C

-ttributes ,ables +no

F5ame +/pJtbl *5ame %5ame +/pJemp D5ame

+mp Dept

+name +no Dno Dname

Funneling S1
Ket data from *ultiple tables

;C<
Is combining

,arget

S2 3ori)ontal combining or 0ertical combining

-s per e/ample here hori)ontal combination is used

Navs notes

Page 8

DataStage
+mp
HC

,rg
2010

Dept ro s.
-s De0eloper ma/imum FD ,arget fields

3ere, 3C means 3ori)ontal combination is used for combine primary ro s ith secondary ill get.

-s De0eloper ma/imum 1DD source fields ill get. !%ook 4pL" means cross 0erification from primary table. -fter document1 .t/t 'f f, c0, 0l, sc, s M t, h M t( ' F> d$( S1
T HC H C TRG

',ypes of d$(

S2

Format of *apping Document.

DAY 6 Architecture of D3'


Navs notes Page 9

DataStage

For e7a$)!e:

d$

e0ery branch ha0e each $ r

*anager Aeliance comm. Aeliance Kroup 1 Aeliance po er *anager Aeliance Fresh N ,%* needs manager

Top Le0el $ r8TLM9 details of belo sales customer employee period order Input

+/planation of abo0e e/ample1 Aeliance group ith some there branches and e0ery branch ha0e one manager. -nd for all this manager one ,op le0el manager ',%*( ill be there. -nd ,%* needs the details of list sho n abo0e for analy)e.
$ottom le0el

For abo0e e/ample ho +,% process is done sho n belo reliance fresh
ETL PROCES S ETL PROCES S

AC.mgr +AP
mini 23> Data mart

D23
De)endent Data Mart

inde)endent Data Mart


Aeliance Fresh'taking one from group directly(

Dependent Data *art1 means the +,% process takes all manager information or d$ and keep in the 2arehouse. $y that the data transmission bet een "arehouse and data mart here depends upon by each other. 3ere $ata mart is also called as ;)ottom le!el.> ;mini /0. as

Navs notes

Page 10

2010

DataStage
sho n in blue color in abo0e figure i.e., the data of indi0idual manager 'like AF, AC, AP and so on(. 3ence the data mart depends up on the 23 is called dependent data mart. Independent Data *art1 only one or indi0idual manager i.e., data mart ere directly access the +,% process ith out any help of 2arehouse. ,hat<s hy its called independent data mart. 651 T(o !e/e! a))roaches: For the both approaches t o layers architecture ill apply. 1. ,op.$ottom le0el approach, and 2. $ottom. ,op le0el approach. I.1. ,op G $ottom le0el approach1 ,he le0el start from top means as per e/ample Aeliance group to their indi0idual managers their +,% process from their to Data 2arehouse 'top le0el( and from their to all separate data marts 'bottom le0el(. A Comm. Data *art
2010

A Po er Aeliance Kroup

ETL PROCE SS

Data *art 2arehouse

A Fresh
,op le0el %ayer I %ayer II

Data *art
$ottom le0el

,op G $ottom le0el approach

In the abo0e the top G bottom le0el is defined, and this approach is in0ented by 2. 3. Inner. 3ere, arehouse is top le0el and all data mart are bottom le0el as sho n in the abo0e figure.

Navs notes

Page 11

DataStage
I.2. $ottom G top le0el approach1 *eans from here the +,% process takes directly from data mart 'D*( and the data put
2010

in the arehouse for reference purpose or storing the D* in the Data 2are3ouse 'D23(.

A comm. D* A po er Aeliance Kroup A fresh


%ayer I :otto$ !e/e!
ETL PROCE SS

D* D*
%ayer II

D23

To) !e/e!

$ottom G ,op le0el approach is in0ented by A :imbell. 3ere, one data mart 'D*( contains information like customer, products, employees, location and so on. ,op G $ottom le0el approach ,hese t o approaches comes under t(o !a"er -rchitecture $ottom G ,op le0el approach

Programming 'coding(

Navs notes

Page 12

DataStage

+,% ,ool<s1

K4I'graph user interface( ,his tool<s to !e/tract the data from heterogeneous source".
2010

+,% program ,ool<s are !,ara Data> 6racle> D$2 M so onO"

65&5 Four !a"ers of D3' Architecture: 65&515 La"er I: D*

D* Source
%ayer I

D23

Source D*
%ayer I

In this layer the data send directly in first case from source to Data 2are3ouse'D23( and in second case source to group of Data *arts'D*(. 65&5&5 La"er II:

D* SAC
%ayer I

D* SAC D23
D*
%ayer I %ayer II

D23 D*
%ayer II

TOP ; :OTTOM APP0OAC'

:OTTOM ; TOP APP0OAC'

In this layer the data follo from source G data arehouse G data mart and this type of follo is called !top G bottom approach". -nd in another case the data follo from source G data

Navs notes

Page 13

DataStage
marts G data arehouse and this type of follo ing data is called !bottom G top approach". For this %ayer II architecture is e/plained in the abo0e sho n e/ample eg. Aeliance group. ? '@@.@@B using layer F and layer #( 65&5,5 La"er III:
2010

D* Source 6DS D23 D* D*


%ayer I %ayer II %ayer III

In this layer the data follo from source G 6DS 'operations data stores( G D23 G Data *arts. 3ere the ne concept add that is 6DS means operations of data stores for at period like I months or one year that data used to sol0e instance problem here the +,% de0eloper is not in0ol0ed here. -nd ho sol0e the instance> temporary problems that team called Interface team is in0ol0ed here. ,he 6DS data stores after the period into the D23 and from that it goes to D* there the +,% de0elopers in0ol0es here in layer F. ,he clear e/planation about the layer F architecture in the belo e/ample, it is the best e/ample for clear e/planation.

+/ample P11

Navs notes

Page 14

DataStage

'at least or ma/. 2hrs to sol0e the problem (


%ayer I

+,% de0. In0ol0es here

-irport terminal
Interface team in0ol0es here
Airport base station

D23 Stores problem info for future references D*

%ayer III
ODS

%ayer II

Problem information captured

Data $ase 'stores the technical problem in d$ for 1year(


OPE0ATIONS DATA STO0E

+/ample e/planation1 In this e/ample, source is aero plan that is for aiting for landing to the airport terminal. $ut it is not to suppose to land because of some technical problem in the airport base station. ,o sol0e this type operations special team in0ol0es here i.e., interface team. In the airport base station the technical problems and the 6perations Data Store '6DS( in db i.e., simple say problem information captured. $ut the 6DS stores the data for one year only. -nd years of database stores in the data arehouse because of some technical problems to be not repeat or for future reference. From D23 to it goes to Data *arts here +,% de0elopers in0ol0es for sol0e technical problems i.e., is also called layer F architecture of data arehouse. DAY <

Navs notes

Page 15

2010
D* D*

Source 'it is

aiting for landing, because of some technical problem(

DataStage
Continues-55 Pro4ect Architecture: <515 La"er I%: %ayer # is also called as !Pro7ect -rchitecture"

"oo# u$
It is for data backup of D23 M SQC %F $usiness intelligence

Sou !e 1

Interface Files (FLAT FILES)

ETL
Aead flat files through DS %2

D"

#I D!

Sou !e 2

%#
For&at !IS!ATC %

Con$ition !IS!ATC %

ODS

S C D!

Aeporting

Sou !e

%ayer I

SAC

Fi ure: Pro7ect -rchitecture > layer IQ 3ere, 6DS.operations data store, D2. Data 2arehouse, D*. Data *art, SQC. Single 0ie customer,
$I. $usiness Intelligence. %2 M %F M %#. layer2,F,#. ............. reference data . . . . . . . .R re7ect data

-bout the pro7ect architecture1

Navs notes

Page 16

2010

DataStage
In pro7ect architecture, there are # layers. In first layer source to interface files'flat files(, to 6DS. 2hen +,% sending the flat files to 6DS if any mismatch data ill there it ill drops that data. ,here are t o types mismatch data 1. Condition mismatch 2. Format mismatch. In third layer the +,% transfer the data to arehouse. In last layer data arehouse to check hether a single customer or not and data loading or transmission in bet een D23 and D*'business intelligence(. 5ote1 'Information about dropped data hen the transmission done bet een +,% reads the flat files'.t/t, cs0, ./ml and so on( to 6DS.( , o types of mismatch data1

2010
T g on") e*. 'no + 10

Coming to second layer +,% reads the flat files through the DataStage'DS( and sends

Condition mismatch'C*(1 this 0erify the data from flat files hether they are conditions are correct or mismatched, if it is mismatched the record ill drops automatically. ,o see the drop data the reference lin* is used and it sho s hich record is condition mismatched.

Format mismatch'F*(1 this is also like condition mismatch but it checks on the format hether the sending data or records is format is correct or mismatched. 3ere also reference lin* is used to see drop data.

+/ample for condition mismatch1 -n employee table contains some data S9%R select ? from emp8
EID 08 19 99 15 E'A!E Naveen Munna Su%an S avan D'O 10 20 30 10
Conta&ns 'no 10(20(30(1 0

e&p tbl

TR (
Re,e en! e "&n# $rops20)*0 fro& e&p

+/ample for Format *ismatch1

Navs notes

Page 17

DataStage

EID E'a&e Place 111 naveen %n!" 222 #n" %unna

3ere the table format is tab G space separated.


2010

,he cross mark record has format mismatched so that the record its 7ust re7ected.

<5&5 Single %ie Customer 'SCQ(1 It is also called as !single 0ersion of truth". For e/ample1 ?to make uni=ue customerS Same records Phase G II R identify field by field. Phase G IIIR cannot identify in this.

C'a&e A$$s+ naveen sav&ngs %unna &nsu an!e su%an ! e'&t

& multiple records of customers

transforming
C'a&e A$$s+ sav&ngs( "oan &nsu an!e( ! e'&t

3ere DataStage people in0ol0es in this process SQC> single 0ersion of truth

naveen %unna 'e$os&t su%an

,his type of transforming is also called as Ae0erse Pi0oting. 56,+1 $usiness intelligence')I D*( is for data backup of D23 M SQC'single 0ersion of truth(. DAY =

Navs notes

Page 18

DataStage
Di$ensiona! Mode! *odeling1 it represent in physical or logical design from our source system to target system. o %ogical design1 client perspecti0e, o Physical design1 data base perspecti0e.
P&!to &a" -&e. /og&!a" -&e.
E! P S, De pt #

optional *anual

3ere the abo0e is Designing manual

Data *odeler<s are use D* ,ools


o o

+A2I5 +A G S,4DI6

For ard +ngineering Ae0erse +ngineering

+ntity relation indo s '+A2I5(, +ntity relation studio'+A.Studio( these t o are data modeler<s here logical and physical design is done.
*ata Data1 e0ery entity has a structure is called *eta Data'simple say ;a data to a

data<( o In a table there are attributes and domain, t o types of domain they are 1. -lphabetical and 2. 5umber.
For ard +ngineering 'F+(1 it<s starting from the scratch. Ae0erse +ngineering 'A+(1 it<s create from e/isting system is kno n as A+. are simple

say ! altering the e/isting process" For e/ample1 91 -n client re=uired a e/perience of an employee.
Navs notes Page 19

2010

DataStage

SAC I$)!icit re1uire$ent 'is e/perience of employee( 3ire Date


2010

+*PJtable

From De0eloper point of 0ie is E7)!icit 0e1uire$ent 'to find out e0erything as per the client re=uirement ant to see( ,AK
ENo

'+mployee hire detailed information( ENa$e Tears *onths Days 3ours *inutes Seconds

5anoJSeconds

%o est le0el Detailed Information =515 Di$ensiona! Ta>!e: ,o find out e0erything as per the client re=uirement ant to see 'or( the !%o est le0el of detailed information" in the target tables is called Dimensional ,able.

91 ho the tables are interconnected is sho n belo . .


.

3ere taking some tables and linking ith them ith related to other tables. %ike in product table. ,his link is created by using foreign key and primary key. Foreign :ey1 means hich is constraint and used as reference for other table. Primary :ey1 means hich is constraint, it is a combination of uni=ue and not null. Surrogate key. ,ables as follo . Foreign :ey

. . . . .

Navs notes

Page 20

DataStage
Pro$-ct.ID PRD.Desc PRD.T/PE.ID
2010

Primary :ey
PRD.T/PE.ID PRD.SP.ID PRD.Cate0or1

Foreign :ey

%ink +stablishing 4sing Fk M Pk

Primary :ey
PRD.SP.ID S'a&e ADD1

=5&5 Nor$a!i?ed Data: In a table there if repetiti0e information or records is called Aedundancy, that information is to minimi)e or that techni=ue is called as 5ormali)ation. For e/ample1
E'O E'a&e Desi0nation ,-ali+ A$$1 A$$2 111 222 333 444 D'o %i02er

naveen 0T/ Deve"o$e 10 M.T0CH 1NT2 H3D %unna S)ste% ana")s&s 20 M.SC S-2 H3D S avan 14-4 Deve"o$e 10 M.T0CH 1NT2 H3D Ra5u Ca"" Cente 30 M.SC

,hese is Aepetiti0e Information or Aedundancy

Fk
E'O E'a&e Desi0nation D'o ,his di0iding ,echni=ue is kno n

Pk
D'o 10 %i02er ,-ali+ A$$1 A$$2 M.T0CH 1NT2 H3D M.SC

111 naveen 0T/ Deve"o$e 10 222 %unna S)ste% ana")s&s 20 333 S avan 14-4 Deve"o$e 10 444 Ra5u Ca"" Cente 30 555 Ra5es6 14-4 ,he target table must be al

Nor$a!i?ation 20

S-2

H3D

'or( reducing
0edundanc"

ays in De.5ormali)ed format.

Navs notes

Page 21

DataStage

'C

5ormali)ation

De.5ormali)ation

De@Nor$a!i?ation means combining the multiple tables into one table. -nd

combining is done by 3ori)ontal combine.


$ut it is not in all cases, that de.normali)ed is must and should.

DAY A

Navs notes

Page 22

2010

DataStage
E@0 Mode! An Entit"@0e!ationshi) Mode!: In logical design, there are t o options to design a 7ob. ,hey are 1. 6ptional, and 2. *anual. *andatory is must 1. primary table M n.secondary table

+*P table ,he gi0en t o tables +*P and D+P,


E'O E'a&e Desi0nation

D+P, table

D'o D'o 10 20

%i02er ,-ali+ A$$1 A$$2 M.T0CH 1NT2 H3D M.SC

111 naveen 0T/ Deve"o$e 10 222 %unna S)ste% ana")s&s 20 333 S avan 14-4 Deve"o$e 10 444 Ra5u Ca"" Cente 30 555 Ra5es6 14-4

S-2

H3D

Primary 'or also kno n as Master Ta>!e( Secondary 'or also kno n as Chi!d Ta>!e( 3ere from abo0e t o tables the primary table is D+P, table, because is not depends for any other table. -nd +*P table is secondary table because it is depends on the D+P, table.
$ut

hen e take in real time, that e 7oining the t o table by using 3ori)ontal

combining it takes the +*P table as primary table and D+P, table as secondary table. A515 'ori?onta! Co$>ine:
Navs notes Page 23

2010

DataStage
,o perform hori)ontal combining e must follo these cases. It must ha0e multiple sources.
2010

,here should be dependency.


1 G Primary, n G secondary.

3ori)ontal combining is also called as U6I5. 3C means combining primary ro s ith secondary ro s based on the primary key column 0alues. ,here are three types of keys, they are o Primary key, o Foreign key, and
o

Surgut key.

For e/ample combining these t o tables1 +*P M D+P, tables Fk


E'O E'a&e Desi0nation D'o 10 20

Pk
D'o %i02er ,-ali+ A$$1 A$$2 M.T0CH 1NT2 H3D M.SC

111 naveen 0T/ Deve"o$e 10 222 %unna S)ste% ana")s&s 20 333 S avan 14-4 Deve"o$e 10 444 Ra5u Ca"" Cente 30 555 naveen 0T/

S-2

H3D

-fter combining or 7oining the table by using 3C, hence it<s like belo
E'O E'a&e ,-ali+ A$$1 111 naveen M.T0CH 1NT2 222 %unna S-2 H3D Desi0nation A$$2 0T/ Deve"o$e H3D S)ste% ana")s&s D'o 10 20 M.SC %i02er

Navs notes

Page 24

DataStage
A5&5 Different t")es of Sche$aBs: ,here are four types schemas, they are o Sno Flake Schema, o *ulti S,-A Schema, and o Kala/y Schema. 15 STA0 Sche$a: In the star schema, you must kno about t o things o Dimensional table, and o Fact table.
Dimensional table1 means ;%o est le0el detailed information< of a table.
2010

o S,-A Schema,

Fact ,able1 means it is collection of foreign key from n. dimensional tables. Definition of S,-A Schema1 !- Fact ,able collection of foreign key surrounded by multiple dimensional table and each dimensional collection of de.normali)ed data, it is called S,-A Schema." ,he data transmission is done in t o different methods, in pictorial ay it look like as belo ,ransmission
Sou ! e
T D7M ta8"e T

94CT t8"

in practical ay it directly from source to dimensional table and fact table.


D7M ta8"e

Sou ! e

T T

94CT t8"

Navs notes

Page 25

DataStage

+/ample for S,-A Schema1


2010

!-s taking some tables as belo to deri0e a star schema from that". 91 display hat suman buy a lu/ product in ameerpet on Uanuary 1st eekS

$ridge> intermediate table Product table $rand table Category table Customer table 4nit table CustomerJCategoryJtable
9a!t ta8"e Cust:D&%:t8"
P*
F3 F3 F3

PRD:D&%:t8"

P*

P*

Date:D&%:t8"

Customer table %ocation table

/o!:D&%:t8"
P*

3ere, Pk G primary key, and Fk G foreign key. $y abo0e sho n that fact table is surrounded by the dimensional table, and fact table is collection of foreign key, here dimensional table is lo est le0el detailed information. -nd fact table is also called as $ridge or Intermediate table. $ut in current market S,-A Schema and Sno Flake Schema is using rarely.
In the fact table, measurements mean taking the information as per client re=uirement

or user re=uirement.
-s per abo0e =uestion, it needs information PADJdimJtbl, DateJdimJtbl,

CustJdimJtbl, and %ocJdimJtbl. ,he link is creating to the measurements i.e., for Fact table by foreign key and primary key.

Navs notes

Page 26

DataStage

&5 Sno( F!a*e Sche$a: lookup table is called Sno Flake Schema. For e/ample1 Fk Pk Fk Pk Fk Pk
4 ea
2010

,he fact.tables surrounded by dimensional table, each dimensional table ha0e

0MP:t 8"

De$t:t 8"

/o!at& ons

If e ant to re=uire the information from location table it fetch from that table and display the client re=uired. ,o minimi)e the huge data at once or in a one dimensional table, some times it not possible to bring as soon as possible if huge data in dimensional table.
,hat is reason

e di0ide the dimensional table, into some tables. -nd that tables is

kno n as !loo* up ta,les" Definition of Sno Flake Schema1 !,he Fact table surrounded by dimensional tables, and each dimensional table ha0e look up tables is called Sno Flake Schema". S,-A Schema orks effecti0ely De.normali)ation
D N

Ca nosC: O

So-r ce

D"%

Aeports
N

MIDC'1 5ormali)ation Sno Flake Schema orks effecti0ely


Navs notes Page 27

DataStage
56,+1 Selection of Schema in run time it is depends on report generation.

:: DataSta e CONCEPTS:: DAY 1E DataSta e 8DS9 Conce)ts:

2010

3istory of DS, Feature of DS, Differences bet een C.&./2 and E.D.1 0ersions, -rchitecture of C.&./2 and E.D.1 0ersions, +nhancements and ne features of 0ersion E.D.1

'ISTO0Y of DataSta e -n +,% tool according year 2DDI there are IDD tools in market, some of they are DataStage Parallel +/tends, 6DI'62$(, S-S'+,% Studio(, $6DI, -binity and so onO $ut DataStage is so famous and idely used in the market and it is to e/pensi0e also.
91 2hat is DataStageS -5S1 DataStage is a comprehensi0e +,% tool, hich pro0ides +nd G to G +nd +nterprise Aesource Planning '+AP( solution 'here, comprehensi!e means good in all areas(

3istory begins1
.

In 1@@C, first 0ersion of DataStage is released by Q*-A: company i.e., 4S based company, and the *r. %++ SC3+FF%+A is father of DataStage. 6nly & members in0ol0ed in release the soft are into the market. DataStage those days called as !Data Integrator".

. .

Navs notes

Page 28

DataStage
. .

,here are @DB changes from 1@@C to 2D1D comparing to release 0ersions. In 1@@C, Data Integrator is ac=uiring by company name called ,6AA+5,. -fter t o years i.e., in 1@@@, I5F6A*IH Company has ac=uired Data Integrator from ,6AA+5, Company.
2010

In 2DDD, -C+5,I-% Company ac=uired both Data $ase and Data Integrator and after that -C+5,I-% DataStage Ser0er +dition released in this year. o $y this company the DataStage has populari)ed into the market from that year. o -nd released soft are ere FD tools used to run.

In 2DD2, -DSS V 6AC3+S,A-,+ means -C+5,I-% company is integrated ith 6AC3+S,A-,+ company for the parallel capabilities.
o

$ecause 6AC3+S,A-,+ 'PH, 45IH( ha0e parallel e/tendable capabilities in 45IH en0ironment. $y integrating -DSS V 6AC3+S,A-,+ and they named as -DSSPH.

o -nd -DSSPH is 0ersion is I.D, from that 0ersion parallel operations starts or parallel capabilities starts.
o

From that parallel 0ersions gone on de0eloping up to C.&.1 0ersion,

o $ut from I.D to C.&.1 0ersions they supports only 45IH fla0ors en0ironment. o $ecause ser0er configured only on 45IH flat form or en0ironment.
.

In 2DD#, a 0ersion C.&./2 is released that support ser0er configuration for indo s flat form also.
o o

For this -DSSPH is integrated ith *:SJ,66%J:I,. *:SJ,66%J:I, is 0irtual 45IH machine that brings the capabilities to indo s for support ser0er configuration. 56,+1 -fter installing the -DSSPHV*:SJ,66%J:I, into the indo s, and all the 45IH commands orks in the indo s flat form.

Navs notes

Page 29

DataStage

In 2DD#, December the 0ersion C.&./2 ere ha0ing -SC+5,I-% suite components o ,hey are,

Profile stage, 9uality stage, -udit stage, *eta stage, DataStage P/, DataStage ,/, DataStage *4S, and so on these are indi0idual tools.

o ,here are 12 types of -SC+5,I-% suite components.


.

In 2DD&, February the I$* ac=uired all the -SC+5,I-% suite components and the I$* released I$* DS ++ i.e., enterprise edition.

In 2DDI, the I$* has made some changes to the I$* DS ++ and the changes are the integrated the profiling stage and audit stage into one, =uality stage, *eta stage, and DataStage P/.
o

2ith the combination of four stages they ha0e released !I$* 2+$SP3+A+ DS M 9S E.D"

o ,his is also called as !Integrated De0eloper +n0ironment" i.e., ID+. . In 2DD@, I$* has released another 0ersion that !I$* I5F6SP3+A+ DS M 9S E.1" o In current market, C.&./2 using #D G &DB E.D.1 using FD G #DB E.1 using 1D G 2DB

Navs notes

Page 30

2010

DataStage

56,+1 DataStage is Front +nd, it nothing to be stored.


2010

DAY 11 DataSta e FEATU0ES Features of DS: ,here are & important features of DataStage, they are . . . . . -ny to -ny, Plat form Independent, 5ode configuration, Partition parallelism, and Pipe line parallelism.

An" to An": o DataStage that capable to any source to any target. P!at for$ Inde)endent: o !- 7ob can run in any processor is called plat form independent" o ,hree types of processor are there, they are

45I, Symmetric *ulti Processor'S*P(, and *assi0ely *ulti Processor '**P(. S*P
HDD SMP ;1 C P 2 Page 31 SMP ;2

45I
HDD

**P

C C P 2 Navs notes P 2

C P 2

C P 2

DataStage
FFF

SMP ;3

GGG

SMP ;n

A-* Node Confi uration:


o

A-*

5ode is soft are that is created in operating system.

o !5ode is a logical CP4 i.e., is instance of physical CP4.


o

3ence, using soft are it is !the process of creating 0irtual CP4<s is called 5ode Configuration."

o 5ode configuration concept is e/clusi0ely ork on the DataStage, it is the best feature comparing from other +,% tools. o For e/ample1

-n +,% 7ob re=uires e/ecuting 1DDDrecordsS For abo0e =uestion an 45I processor takes 1Dmins to e/ecute 1DDD records. $ut for the same =uestion an S*P processor takes 2.& minutes to e/ecute 1DDD records. It is e/plained clearly in belo diagram.

1DDD records

HDD

1DDD records HDD


C C P 2 C P 2 C P 2
3ere,1DDD records share for four CP4< hence e/ecution time reduced.

C P 2

P 2

1D minutes
Navs notes

2.& minutes
Page 32

2010

DataStage
A-* A-*

reduce the e/ecution time for 45I processor. o 4sing 5ode Configuration for the abo0e e/ample to 45I processor.

o In belo figure ho the 0irtual CP4<s can create and reduce the e/ecution time of the process. 1DDD records
created multiple nodes

HDD

CP 2 CP P2 2 CP P2 2 CP P2 2

5ode1 5ode2 5odeF 5ode#

C P 2

1D minutes reduces P2 to 2.&minutes

A-*
Partition )ara!!e!is$: o

Partition is a distributing the data across the nodes, based on partition techni=ues.

o Considering one e/ample hy e use the partition techni=ue<s o +/ample1 taking some records in +*P table and some in D+P, table +*P table ha0e @ records, D+P, table ha0e F records.

o -fter partitioning these records output must and should ha0e @ records, because here primary table is @ records. EMP81D,2D,1D,FD,2D,1D,1D,2D,FD9 and DEPT81D,2D,FD9

Navs notes

Page 33

2010

o -s per abo0e e/ample, 5ode Configuration is also can create 0irtual CP4<s to

DataStage

51 12 52 5F
o

1D,2D,1D FD,2D,1D 1D,2D,FD

1D 2D FD

only 2 matched 1 1 .
total only # matched $ut output must be @ records

In the abo0e e/ample, only # records are in there in final output and & records are missing for this reason the partition techni=ues are introduced. -nd there are t o types of partition parallelism categories, in those total E types of partition techni=ues are there. :ey based 3ash *odulus Aange Db>2

:ey less Same Aandom +ntire Aound robin

o :ey based category or key based techni=ues ill gi0e the assurance, to the same key column 0alue to collected same key partition.
o

:ey less techni=ue is used to append the data for 7oining gi0en tables.

From abo0e taken records e partitioning using key based. He" >ased partitioning

Navs notes

Page 34

2010

DataStage
EMP DNO N1 1D N& 2D N, FD

DEPT

DAY 1& ContinuesFeatures of DataSta e Partition Para!!e!is$:


o

0e ; Partition: means re G distributing the distributed data.


E'O E'a&e D'o Loc 111 4P 222 TN 333 =N 444 naveen %unna S avan Ra5u 30 10 20 10

P1 P& P,

+*P

1D 2D FD

51 N& 5F Dno

51 -P N& ,5 5F :5 Loc
Page 35

Dno D+P,

1<7N

Navs notes

2010

1<7N

DataStage
o

First partition is done by key based partition for dno, and taking a separate column as location, for that it re G distributing for the distributed data. i.e.,
2010

kno n as Ae G Partition.

o 0e/erse Partitionin :

It is also called as collecting. $ut it done in one case only or in one situation only 1 ! hen the data mo0e from parallel stage to se=uential stage the collecting happens in this case only" Designing 7ob in !stages" is also called as link or pipe, this is channel it is mo0ing data from one stage to another stage.

SRC

TRS 9

TRG

S1 +/ample1

S2

SF

3ere collecting to 5odes 51 5n into 5


2

S 1

Parallel files

Se=uential>Single file

,here are four categories of collecting techni=ues 6rder Aound robin

Navs notes

Page 36

DataStage
51 a,/ 52 b,y 5F c,) Pi)e!ine Para!!e!is$: 5 Sort G merge -uto
Or$er A-to a > 8 ) ! ? a 8 ! > ) ? a 8 ! > ) ? a ? ) ! > 8 RR S!
2010

+/ample for collecting techni=ues1

!-ll pipes carry the data parallel and the process done simultaneously" o In ser0er en0ironment1 the e/ecution process is called traditional batch processing. o For e/ample1 ho the e/ecution done in ser0er en0ironment e see +/tract
S 1

,ransform 1Dmin
S 2

%oad
S 3

1Dmin

1Dmin

HD

HD

3ere, the e/ecution taken FDminutes to complete.

Same 7ob in parallel en0ironment 1 + ,


AF A1

%
S 3

A&

S 1

A#

S 2

A2

Navs notes

Page 37

DataStage
3ere, all the pipe carry the data parallel and processing the 7ob simultaneously and the e/ecution taken only 1D minutes to complete
2010

$y using the pipeline parallelism e can reduce the process time.

DAY 1, Differences >et(een <5257& I =5E51 Differences: C.&./2 <5257& . # client components ? DS Designer ? DS *anager ? DS Director ? DS -dministrator E.D.1 =5E51 . & client components ? DS Designer ? DS Director ? DS -dministrator ? 2eb Console ? Information -naly)er . -rchitecture Components ? Ser0er Component ? Client Component . -rchitecture Components ? Common 4ser Interface ? Common Aepository ? Common +ngine ? Common Connecti0ity ? Common shared Ser0ices . II. tier architecture . 5.,ier architecture
Page 38

Navs notes

DataStage
. .
.

6S dependent .r.t. users Capable of P.III M P.IQ 5o eb based administration File based repository

. 6S independent .r.t. users but one time dependent only. . 2eb based administration through . Data $ase based repository
2010

. Capable of all phases.

eb console' simple say ork from home(


.

1,515 Client components of C.&./21


.

DS Designer1 it is to create 7obs, compile, run and multiple 7ob compile. # types of 7obs can handle by DS Designer. *ainframes 7ob Ser0er 7ob Parallel 7ob Uob se=uence 7ob

DS Director1 it can handle the gi0en list belo . Schedule , run 7ob<s *onitor, 4nlock, batch 7obs Qie s'7ob, status, logs( *essage 3andling. Import and +/port of Aepository components 5ode Configuration Create pro7ect Delete pro7ect 6rgani)e pro7ect

DS *anager1 it can handle the gi0en list belo .

DS -dministrator1 it can handle the gi0en list belo .

1,515 Client components of E.D.11


Navs notes Page 39

DataStage

DS Designer1 it is to create 7obs, compile, run and multiple 7ob compile.


2010

# types of 7obs can handle by DS Designer. *ainframes 7ob Ser0er 7ob Parallel 7ob Uob se=uence 7ob Data =uality 7ob

. . .

DS Director1 same in as abo0e sho n in C.&./2 DS -dministrator1 same in as abo0e sho n in C.&./2 2eb Console1 administrator components through hich performing. Security ser0ices Scheduling ser0ices %ogging ser0ices Aeporting ser0ices Session management Domain manager It perform all phase.I acti0ities Column analysis, Primary key analysis, Foreign key analysis, $ase %ine analysis, and Cross domain analysis.

Information -naly)er1 is also called as console for I$* I5F6 S+AQ+A.

-s an +,% de0eloper you can come across DS Designer and DS Director. $ut, some information to be kno s about 2eb console, Information -naly)er, and DS -dministrator.

Navs notes

Page 40

DataStage

DAY 1. Descri)tion of <5257& I =5E51 Architecture

1.515 Architecture of <5257&1 ? Ser0er Components1 it is di0ided into F categories, they are a. Aepository b. +ngine c. Package Installer
0e)ositor"1 is also called as pro7ect or

ork area.

o 3ere repository is also Integrated De0eloper +n0ironment'ID+( ID+ performs design, compile, run, sa0e 7obs.

o Aepository organi)e different component at one area is called collection of components. Some of components are Uob<s ,able definition Shared container

Navs notes

Page 41

2010

DataStage
Aoutines O.. etc.,

o Aepository is for de0eloping application as ell as storing application.


En ine1 it is e/ecuting DataStage 7obs and it automatically selects the partition
2010

techni=ue.
o

5e0er lea0e any stage to autoS

If e lea0e it auto, it select auto partition techni=ue it causes effect on the performance.

Pac*a e Insta!!er1 in this component contains t o types of package installer one plug.

in and another is pack<s. +/ample1


Deri0ers needed 11DD to install

Co%$ut e

7nte ,a! e

P &nte

11DD dri0er pro0ide

3ere, interface is also called as )!u @in bet een computer and printer.

ER P

S"

DS

Packs

$est e/ample that normal indo s HP ac=uires Ser0ice Pack2 for more capabilities
3ere, )ac*s are used to configuration for DataStage to +AP solution.

?Client components1 it is di0ided into # categories, they are a. DS Designer b. DS *anager c. DS Director

Navs notes

Page 42

DataStage
d. DS -dministrator ,hese categories are sho n abo0e hat they handle i.e., in page no F@. 1.5&5 Architecture of =5E511 1. Common user interface1 is also called as unified user. a. 2eb console b. Information analy)er c. DS Designer d. DS Director e. DS -dministrator 2. Common Aepository1 is di0ided into t o types a. Klobal repository1 it is for DataStage 7obs files to store here. 'it<s checks security issues( b. %ocal repository1 it is for indi0idual files stores here'it<s for performance issue( o common repository is also called as *ata Data S+AQ+A o three types Pro7ect le0el *D Design le0el *D 6peration le0el *D
2010

F. Common +ngine1 o It is responsible of Data Profiling analysis Data 9uality analysis Data ,ransmission analysis

#. Common Connecti0ity1 It pro0ides the connections to common repository.

Navs notes

Page 43

DataStage

"C

IA

DE

DI

DA
2010

REPOSITOR/ MD S0R-0R P o5e!t "eve" MD Des&gn "eve" MD <$e at&on "eve" MD

Common shared ser0ices

DP

D, DT DA Co%%on 0ng&ne

Co%%on Conne!t&v&t) ,able representation of !=5E51 Architecture"

DAY 12 Enhance$ent I Ne( feature of /ersion= In /ersion =5E51, there are E categories of stages. Processing stage1 o 5e Stage1 1. SCD'slo changing Dimension( 2. F,P'File transfer Protocol(
3. 2,H'2eb Sphere ,ransfer(

o +nhanced Stage1
1. Surrogate key stage1 it is ne

concept introduced.

2. %ookup stage, pre0iously lookup ha0ing i. 5ormal lookup ii. Sparse lookup 5e ly added iii. Aange lookup i0. Case less lookup

Navs notes

Page 44

DataStage
Data $ase Stage1 o 5e stages1
2010

I2-T Classic federation 6D$C connector 5+,+WW-

o +nhanced Stages1 -ll Stages techni=ues used ith respect to S9% $uilder.

Palate of the 0ersion E.D.1


Keneral Data $ase File Processing Aeal time Aestructure

Data 9uality ne H H

De0elopment M Debug

H H

here,

ha0e changes H no changes

o Palate is shortcuts of stages here e can drag n drop into can0as to do design the 7ob. o Data 9uality is e/clusi0ely ne concept of E.D.1. o Data $ase and processing stages ha0e some changes that sho n abo0e.
o

6ther stages are same as 0ersion C.&./2 i.e., no changes in this 0ersion.

Navs notes

Page 45

DataStage

:: Sta es Process I La> 3or*:: DAY 16 Startin ste)s for DataSta e too! ,he starting of DataStage on the system e must follo the difference steps to do 7ob.

Fi0e difference steps 7ob de0elopment process 'this is for design a 1o,(. D$2 Aepository started and DataStage ser0er started. -fter started1 select DS Designer M enter uid1 M enter p d1
a$&in :::: (e0+; p2il) Pro<ect=na7s>++

M attach appropriate pro7ect1


Palate 45 (it6s fro& tool bar)
(eneral Data ,-alit1 Database @6e e t6e File $"a!e .e Navs notes 8 Deb-0 De7elop&ent 'es&gn t6e 5o8. Processin0 Real Ti&e 0gA Se9 to Se9 Restr-ct-re
Desi0ner Can7as or E$itor C4N-4S

Select appropriate stage in the palate and dragging them on to the C-5Q-S. -nd link them 'or gi0ing connecti0ity( and after that setting properties is important.

Page 46

2010

DataStage
Palate means hich contains all stages shortcuts i.e., C G stages in C.&.2 M E G stages in E.D.
2010

,his sta es are categori)ed into t o groups, they are 1 GR -cti0e Stage ' hat e0er stage is transmission is called acti0e stage(. 2 GR Passi0e Stage 'here hat e0er stage hether e/tracting or loading is called passi0e stage(. In E categories e ha0e use se=uential stage and parallel stage 7obs.

Sa0e, compile and run the 7ob.

Aun director 'to see 0ie s( or to 0ie

the status of your 7ob.

DAY 1< M" first 4o> creatin )rocess Process:


In computer desktop, the current running process

ill sho at the left Conner in that

a round symbol ith green color is to start hen it is not automatically starts. i.e., hether the ser0er for DataStage as start or not. If not manually to start.
2hen Eth 0ersion of DataStage is installed fi0e client components short cuts 0isible

on desktop.

2eb Console Information -naly)er DS -dministrator DS Designer DS Director

2eb Console1 hen you ill click, it displays ! the login page appears"

Navs notes

Page 47

DataStage
o If ser0er is not started, it displays !the page cannot open" error ill appear. o If error occurs like that, the ser0er must be restart for doing or creating 7obs. DS -dministrator1 it is for creating>deleting>organi)e the pro7ect. DS Director1 it is for 0ie s the status of the 7ob e/ecuted, and to 0ie log, status, arnings.
2010

DS Designer1 hen you ill click on the designer icon, it ill display to attach the pro7ect for creating a ne 7ob. -s sho n as belo o 4ser id1 admin
o

Pass ord1 ????

o If authentication failed to login i.e., because repository interface error.


$elo

figure sho ing ho to authenticate M sho s designer can0as for creating

7obs.
4tta!6 t6e $ o5e!t B Do%a&n /o!a"6ostA8080 2se Na%e a'%&n Pass.o ' $6&" P o5e!t Te"e!o <= !an! e"

-fter authentication, it displays the Designer can0as o -nd it ask hich 7ob ant to you do, they are

Navs notes

Page 48

DataStage
*ain frames Parallel
2010

Se=uential Ser0er 7obs

-fter clicking on parallel 7obs, go to tool bar G 0ie G palate. In palate the E types of stages ere displayed for designing a 7ob, they are

Keneral Data 9uality Data $ase File De0elopment M Debug Processing Aeal ,ime Ae G Structure

1<515 Fi!e Sta e: #: 3o data can read from filesS File stage can read only flat files and the formats of flat files are .t/t, .cs0, ./ml In .t/t there are different types of formats like f f, sc, cs0, s M t, 3 M ,. .cs0 means comma separated 0alue. ./ml means e/tendable markup language. . In File Stage, there are subGstages like se=uential stage, data set, file set and so on. o +/ample ho a 7ob can e/ecute1 one se=uential file'SF( to another SF.

Source

,arget

Navs notes

Page 49

DataStage
o Source file re=uire target>output properties, and o ,arget file re=uire input>source properties.
2010

In source file, ho

e to read a fileS

o 6n double clicking source file, e must set the properties as belo


File name %ocation Format Structure

XX bro se the file name. XX e/ample in c1X XX .t/t, .cs0, ./ml XX meta data

Keneral properties of se=uential file1

1. Setting > importing source file from local ser0er.

Se"e!t a ,&"e na%eA 9&"eA C !AC'ataCse:sou !e:,&"e.t>t 9&"eA CD ET6&s o$t&on ,o %u"t&$"e $u $osesF CAC'ataCse:sou !e:,&"e.t>t G o.se 8utton

2. Format selection1 .
.

-s per input file taken and the data must to be in gi0en format %ike !tab> space> comma" must to be select one them.

Navs notes

Page 50

DataStage

3. Column structure defining1


2010

LOA

,o get the structure of file. . . Steps for load a structure Import o Se=uential file $ro se the file and import Select the import file o Define the structure.
,hese three are general properties

hen e design for simple 7ob.

DAY 1= Se1uentia! Fi!e Sta e

Navs notes

Page 51

DataStage
Se1uentia! fi!e stage also says as !output properties" . For single structure format only e going to use se=uential file stage.
2010

6utput Properties . -bout Se=uential File Stage and ho it orks1

Input Properties

Step11 Se=uential file is file stage, that it to read flat files from different of

e/tensions'.t/t, .cs0, ./ml(


Step 21 SF it reads> rites se=uentially by default,

hen it reads> rites from single

file.
o

-nd it also reads> rites parallel hen it read> rites to or from multiple files

Step F1 Se=uential stage supports one input 'or( one output and one re7ect link.

Lin* : %ink is also a stage that transforms data from one stage to another stage. o ,hat link has di0ided into categories. Stream link Ae7ect link SF SF SF SF

Lin* Mar*er:

Aeference link

SF

SF

It is sho ho the link beha0es bet een the transmissions from source to target.

Navs notes

Page 52

DataStage

1. Aeady $6H1 it is indicate that !a stage is ready

ith *ata Data" and data transform


2010

bet een se=uential stages to se=uential stage.

Aeady $6H
2. F-5 I51 it indicates

hen !a data transform from parallel stage to se=uential stage" and it

done hen collecting happens

F-5 I5

3. F-5 64,1 it indicates

hen !a data transform from se=uential stage to parallel stage" and

it is also called auto partition.

F-5 64,
4. $6H1 it indicates

hen !a data transform from parallel stage to parallel stage" and it is

also kno n as partitioning.

$6H

Navs notes

Page 53

DataStage
5. $62 G ,I+1 it indicates

hen !a data transform parallel stage to parallel stage" and it is

also kno n as re.partitioning.


2010

$62 G ,I+

Lin* Co!or: ,he link color indicates the process in e/ecution of a 7ob.

%I5: A+D1 o - link in A+D color means $%-C:1 o - link in $%-C: color means !a stage is ready". $%4+1
o

case11 a stage not connected properly and case21 7ob aborted

- link in $%4+ color means ! it indicates that a 7ob e/ecution on process"

KA++51 o - link in KA++5 color means !e/ecution of 7ob finished".

NOTE: !Sta e is an operator8 operator is a pre G built in component". $ecause the stage that imports import operator for purpose of creating in 5ati0e Format. 5ati0e Format is DataStage under stable format. So, sta e is a o)erator.

Navs notes

Page 54

DataStage
Co$)i!e: Compile is a translator that source code to target code.
2010

Compiling .C function

+ C

3%%

+ E@ E + O# ?

$C

-%%

?3%% G 3igh %e0el %anguage ?-%% G -ssembly %e0el %anguage ?$C G $inary Code

Compiling process in DataStage1 (A I + E@ E + O# ? *C

6S3 Code M CVV ?*C G *achine Code ?6S3 G 6rchestrate Shell Script

Note: 6rchestrate Shell Script generate for all stage e/cept one i.e., ,ransformer stage that is done by CVV. In )rocess, it checks for
%ink Ae=uirement 'checks for link(

*andatory stage properties Synta/ Aules

Navs notes

Page 55

DataStage
DAY 1A Se1uentia! Fi!e Sta e Pro)erties Pro)erties:
Aead *ethods1 t o options are o o
2010

Specific File1 user or client to gi0e specifically each file name. File Pattern1 e can use ild card character and search for pattern i.e., ? M S For e/ample1 C1Xeid?.t/t C1XeidSS.t/t

Ae7ect *ode1 to handle a !format>data type>condition" miss match records.

,hree options
o o o

Continue1 Drops the miss match and continue other records. Fail1 7ob aborted. 6utput1 its capture the drop data through the link to another se=uential file.

First line or record of table1 true>false.


o o

If it false, it display the first line also a drop record. +lse it is true, it<s doesn<t drop the first record.

*issing File *ode1 if any file name miss this option used

, o options
o o

6k1 drops the file name hen missed. +rror1 if file name miss it aborts the 7ob. hich

File 5ame Column1 !source information at the target" it gi0es information about

record in hich address in local ser0er. Directly to add a ne column to e/isting table and it<s displays in that column. Ao 5umber Column1 !Source record number at target" it gi0es information about hich source record number at target table.

Navs notes

Page 56

DataStage
It is also directly to add a ne column to e/isting table and it<s displays in that column. Aead First Ao s1 ! ill get you top first n.records ro s" records
Filter1 !blocking un anted data based on 45IH filter commands"
2010

o Aead First Ao s option ill asks gi0e n 0alue to display the n number of

+/ample1

%ike grep, egrep, OO..so on

o grep !moon" 8 XX it is case sensiti0e that display only moon contained records. o grep . i !moon" XX it ignores the case sensiti0e it displays all moon records. o grep . !moon" XX it displays e/act match record.

Aead from *ultiple 5odes1 e can read the data parallel from using se=uential stage Aeads parallel is possible %oading parallel is not possible

LIMITATIONS of SF1

o It should be se=uential processing' process the data in se=uential( o *emory limit 2gb'.t/t format( o Problem ith se=uential is con0ersions. %ike -SCII G 5F G -SCII G 5F

o It is lands or resides the data !outside of boundary" of DataStage.

Navs notes

Page 57

DataStage

DAY &E Denera! settin s DataSta e and a>out Data Set Default setting for startup ith parallel 7ob1 . ,ools o 6ptions Select a default . . -nd to create ne 1 it ask hich type of 7ob u ant.
2010

,ypes of 7obs are main frames>parallel>se=uential>ser0er. -fter setting abo0e hen e restart the DS Designer it directly goes designer can0as. -ccording naming standards e0ery stage has to be named. o 5aming a stage is simple, 7ust right click on a stage rename option is 0isible and name a stage as naming standards.

Denera! Sta e: In this stage the some of stage ere used for commenting a stage hat they beha0e or hat a stage can perform to do i.e., simple gi0ing comments for a stage. %et e discuss on -nnotation M Description -nnotation . . -nnotation1 it is for stage comment. Description -nnotation1 it is used for 7ob title 'any one tile can keep(.

Para!!e! Ca)a>!e of , 4o>s:

Aesides into or

Navs notes

Page 58

DataStage
SAC +/tracting ,AK landing the data into %S>AA>db
2010

91 In hich format the data sends bet een the source file to target fileS

-1 if e send a .t/t file from source, it is -SCII format because .t/t file support only -SCII format and DataStage support the 5ati0e format only, here the -SCII code ill con0ert into 5ati0e format that is understandable to DataStage. -nd at target -SCII code ill con0ert into .t/t format to user>client 0isible.
!5ati0e Format" is also called as Qirtual Dataset. @6en .e !onve t N9 !o'e &nto 4SC77. Ta get nee' to &%$o t an o$e ato .

5F -SCII srcJf.t/t
@6en .e !onve t 4SC77 !o'e &nto N9. SRC nee' to &%$o t an

-SCII trgJf.t/t

Data Set 8DS91 !It is file stage, and it is used staging the data hen e design dependent 7obs". Data Set o0er comes the limitation of se=uential file stage for the better performance. $y default Data Set sends the data in parallel. In Data Set the data lands in to !5ati0e Format". 91 3o the Data Set o0er comes the se=uential file limitationS . . $y default the data process parallel. *ore than 2 K$.
Page 59

Navs notes

DataStage
.

5o need of con0ersion, because Dataset represent or data directly resides into 5ati0e format. ,he data %ands in the DataStage repository. Data Set e/tension is ?.ds

Structure sa0ing as !stJtrg" srcJf.t/t 91 3o the con0ersion is easy in Data SetS .


.

trgJf.ds

e can copy the !trgJf.ds" file name and also e must sa0e the structure of the trgJf.ds e/ample stJtrg. 2e can use the sa0ed file name and structure of the target in other 7ob.
copying the structure stJtrg M trgJf.ds for reusing here.

trgJf.ds .

trgJf.t/t

Data Set can read only 5ati0e Format file, like DataStage reads only orchestrate format.

Navs notes

Page 60

2010

DataStage

DAY &1 T")es of Data Set 8DS9 T(o t")es of Data Set, they are Qirtual 'temporary( Persistency 'permanent(
.

Qirtual1 it is a Data Set stage that the data mo0es in the link from one stage to another stage i.e., link holds the data temporary. Persistency1 means the data sending from the link it directly lands into the repository. ,hat data is permanent.

-lias of Data Set1 o 6AC3+S,A-,+ FI%+ o 6S FI%+ 91 3o many files are created internally hen e created data setS -1 Data Set is not a single file8 it creates multiple files hen it created internally. o Descriptor file o Data file o Control file o 3eader file
Descriptor File1 it contains schema details and address of data.

Data File1 consists of data in the 5ati0e Format and resides in DataStage repository.

Control File1

Navs notes

Page 61

2010

DataStage
It resides in the operating system and both acting as interface bet een descriptor file and data file. Physical file means it stores in the local dri0e> local ser0er.
Permanently stores in the install program files c1XibmXinser..Xser0erXdatasetY!pools"Z
2010

3eader File1

91 3o can e organi)e Data Set to 0ie >copy>delete in real time and etc., -1 Case11 e can<t directly delete the Data Set Case21 e can<t directly see it or 0ie it.
Data Set organi)es using utilities.

o 4sing K4I i.e., e ha0e utility in tool 'dataset management( o 4sing Command %ine1 e ha0e to start ith [orachadmin grep !moon"8 5a0igation of organi)e Data Set in K4I1 o ,ools

Dataset *anagement . FileJname.ds'eg.1 dataset.ds(

o ,hen e ill see the general information of dataset Schema indo Data indo Copy indo Delete indo

-t command line o [orachadmin rm dataset.ds 'this is correct process( XX this command for remo0e a file o [rm dataset.ds 'this is rong process( XX cannot rite like this o [ds records XX to 0ie files in a folder

Navs notes

Page 62

DataStage
91 2hat is the operator "hich associates to Dataset1 -1 Dataset doesn<t ha0e any operator, but it uses copy operator has a it<s operator.
2010

Dataset %ersion1
. .

Dataset ha0e 0ersion control Dataset has 0ersion for different DataStage 0ersion Default 0ersion in E is it sa0es in the 0ersion #.1 i.e., 0#1

91 ho to perform 0ersion control in run timeS -1 e ha0e set the en0ironment 0ariable for this =uestion. 5a0igation for ho to set a en0ironment 0ariable. Uob properties o Parameters -dd en0ironments 0ariable . Compile
o

Dataset 0ersion '[-P,J2AI,+JDSJQ+ASI65( Click on that.

-fter doing this

hen e ant to sa0e the 7ob, it ill ask hether hich 0ersion you

ant.

Navs notes

Page 63

DataStage

DAY && Fi!e Set I Se1uentia! Fi!e 8SF9 in)ut )ro)erties Fi!e Set 8FS9: FIt is also a staging the data". .
.

File stage is same to design in dependent 7obs. Data Set M File Set are same, but ha0ing minor differences ,he differences bet een DS M FS are sho n belo
Data Set Hav&ng $a a""e" e>ten'a8"e !a$a8&"&t&es Mo e t6an 2 GG "&%&t N< R010CT "&n# .&t6 t6e Dataset DS &s e>!"us&ve") ,o &nte na" use DataStage env& on%ent 9&"e Set Hav&ng $a a""e" e>ten'a8"e !a$a8&"&t&es Mo e t6an 2GG "&%&t R010CT /7N= .&t6 &n 9&"e Set

0>te na" a$$"&!at&on ! eate 9S .e use t6e an) ot6e a$$"&!at&on 7%$o t H 0>$o t o$e ato G&na ) 9o %at .,s e>tens&on

Co$) E,&"e na%eF o$e ato Nat&ve ,o %at .'s ,&"es saves

$ut, Data Set ha0e more performance than File Set.

Navs notes

Page 64

2010

DataStage
Se1uentia! Fi!e Sta e1 input properties . Setting input properties at target file, and at target there ha0e four properties
2010

1. File update mode 2. Cleanup on failure F. First line in column names #. Ae7ect *ode File 4pdate *ode1 ha0ing three options G append>create 'error if e/ists(>o0er rite

o -ppend1 hen the multiple file or single file sending to se=uential target it<s appends one file after another file to single file.
o o

Create 'error if e/ists(1 7ust creating a file if not e/ist or gi0en rong. 60er rite1 it<s o0er riting one file ith another file.

Setting passing 0alue in Aun time'for file update mode( o Uob properties Parameters . -dd en0ironment 0ariables o Parallel -utomatically o0er rite '[-P,JC%6$$+AJ64,P4,(

Cleanup on Failure1 ha0ing t o options G true>false,


,rue G the cleanup on failure option

hen it is true it adds partially coded or records.

Its orks only hen !file update mode" is e=ual to append. False G it<s simple appends or o0er rites the records. First %ine in Column 5ames1 ha0ing t o options G true>false. ,rue G it is enable the first ro or record as a fields of column
False G it is simple reads e0ery ro

include first ro read as record.

Navs notes

Page 65

DataStage
Ae7ect mode1 here re7ect mode is same like as output properties e discussed already before. In this e ha0e three options G continue>fail>output. continues process remain records.
2010

Continue G it 7ust drops

hen the format>condition>data type miss match the data and

Fail G it 7ust abort the file hen format>condition>data type miss match ere found. 6utput G it capture the drops record data. DAY &, De/e!o)$ent I De>u Sta e ,he de0elopment and debug stage ha0ing three categories, they are
1. Stage that Kenerated Data1

a. Ao Kenerated Data b. Column Kenerated Data


2. ,he stage that used to Pick Sample Data1

a. 3ead b. ,ail c. Simple


3. ,he stage that helps in Debugging1

a. Peek Simply say in de0elopment and debug e ha0ing I types of stages and the I stages here di0ided into three categories as abo0e sho n. &,515 Sta es that Denerated Data: 0o( Denerator Data1 !It ha0ing only one output"
Navs notes Page 66

DataStage
.

,he ro generator is for generating the sample data8 in some cases it is used. Some cases are, o For doing testing purpose.
o
2010

2hen client unable to gi0e the data.

,o make 7ob design simple that shoots for 7obs.

.
.

Ao Kenerator can generate the 7unk data automatically by considering data type, or e manual can set a some related understandable data by gi0ing user define 0alues. In this ha0ing only one property and select a structure for creating 7unk data.

Ao Kenerator design as belo 1

A62 Kenerator 5a0igation for Ao Kenerator1 . . . 6pening the AK properties Properties

DSJ,AK

o 5umber of records \ HHH' user define 0alue( Column


o

%oad structure or *eta data if e/isting or e can type their.

For e/ample n\FD . Data generated for the FD records and the 7unk data also generated considering the data type. 91 ho to generate 4ser define 0alue instead of 7unk dataS -1 first e must go to the AK properties . Column o Double click serial number or press ctrlV+ Kenerator

Navs notes

Page 67

DataStage

,ype \ cycle>random 'it is integer data type( In integer data type e ha0e three option
2010

4nder cycle type1

,here are three types of cycle generated data IncrementJ Initial 0alue2 and limit. 91 hen e select initial 0alue\FDS -1 it starts from FD only.

91 hen e select increment\#&S -1 it going to generate a cycle 0alue of from #& and after adds e0ery number ith #&. 91 hen e select limit\2DS -1 it is going to generate up to limit number in a cycle form. 4nder Aandom type1 ,here are three types of random generated data G limit, seed, and signed. 91 hen e select limit\2DS -1 it going to generate random 0alue up to limit\2D and continues if more than 2D ro s. 91 hen e select seed\HH8 -1 it is going to generate the 7unk data for random 0alues. 91 hen e select signedS -1 it going to generate signed 0alues for the field '0alues bet een Glimit and Vlimit(, other ise generate 0alues bet een D and Vlimit. Co!u$n Denerator Data: !it ha0ing the one input and one output" . *ain purpose of column generator to group a table as one.

Navs notes

Page 68

DataStage
.
.

-nd by using this e add e/tra column for the added column the 7unk data ill be generated in the output. dropping created column into e/isting table.
2010

3ere mapping should be done in the column generated properties, means 7ust drag and

Se=uential file . . .

Column Kenerator

DataSet

Coming to the column generator properties. ,o open the properties 7ust double clicking on that. Stage o 6ptions Column to generate \S -nd so on e can gi0e up to re=uired.

5a0igation1

6utput o *apping -fter adding e/tra column it ill 0isible here, and for mapping e drag simple to e/isting table into right side of a table.

Column o 2e can change data type as you re=uire.

In the output,
.

,he 7unk data ill generate automatically for e/tra added columns. For manual e can generate some meaning full data to e/tra column<s 5a0igation for manual1 o Column CtrlV+ Kenerator

. .

Navs notes

Page 69

DataStage
o

-lgorithm \ t o options !cycle> alphabet"

o Cycle G it ha0e only one option i.e., 0alue


2010

o -lphabet G it also ha0e only one option i.e., string. . Cycle is same like abo0e sho n in ro generator. 91 hen e select alphabet here string\na0eenS -1 it going to generate different ro s ith gi0en alphabetical ise.

DAY &. Pic* sa$)!e Data I Pee* &.515 Pic* sa$)!e data: Fit is a debug stage8 there are three types of pick sample data". . . . 3ead ,ail Sample

'ead1 !it reads the top ;n< records of the e0ery partition".

o It ha0ing one input and one output.


o

In the head stage mapping must and should do.

SFJSAC Properties of 3ead1 o Ao s

3+-D

DSJ,AK

Navs notes

Page 70

DataStage
-ll Ao s'after skip(\false
.

It is to copy all ro s to the output follo ing any re=uested skip


2010

positioning 5umber of ro s'per partition(\HH . o Partitions -ll partition \ true .


.

It copy number of ro s from input to output per partition.

,rue1 copies ro from all partitions False1 copies from specific partition numbers, hich must be specified.

Tai!: !it is debug stage, that it can read bottom ;n< ro s from e0ery partition"

o ,ail stage ha0ing one input and one output. o In this stage mapping must and should do. ,hat mapping done in the tail output properties.

SFJSAC Properties of ,ail1

,-I%JF

DSJ,AK

o ,he properties of head and tail are similar ay as sho abo0e. o *ainly e must gi0e the 0alue for !number of ro s to display"
Sa$)!e: !it is also a debug stage consists of period and percentage" o o

Period1 means hen it<s operating is supports one input and one output. Percentage1 means hen it<s operating is supports one input and multiple of outputs.

Navs notes

Page 71

DataStage

SFJSAC

S-*P%+

DSJ,AK hen e gi0e ;n< number of

Period1 if I ha0e some records in source table and

period 0alue it displays or retrie0es the e0ery nth record from the source table.
Skip1 it also displays or retrie0es the e0ery nth record from gi0en source table.

Percentage1 it reads from one input to multiple outputs. o

Coming to the properties 6ptions . .


.

Percentage \ 2& and e must set target \1 Percentage \ &D , target \ D Percentage \ 1& , target \ 2

o 3ere e setting target number that is called link order. o %ink 6rder1 it specifies to hich output the specific data has to be send. o *apping1 it should be done for multiple outputs.

,arget1 ,arget2 SFJSAC S-*P%+

Navs notes

Page 72

2010

DataStage
,argetF

56,+1 sum of percentage of all outputs must be less than are e=ual to ;]\< to ;n< records of input records.
o

In the percentage it distributes the data in percentage form. 2hen sample recei0es the @DB of data from source. It considers @DB as 1DDB and it distributes as e specify.

&.5&5 PEEH:

!it is a debug stage and it helps in debugging stage"

SFJSAC It is used in three types they are

P++:

1. It can use as copying the data from Source to multiple outputs. 2. Send the data into logs. F. -nd it can use as stub stage. 91 3o to send the data into logsS 6pening properties of peek stage, e must assign o 5umber of ro \ 0alueS o Peek record output mode \ 7ob log and so on, as per options
Navs notes Page 73

2010

DataStage
o

If e put column name \ false, it doesn<t sho s the column in the log.

For seeing the log records that e stored.


2010

o In DS Director

From Peek G log G peek . 2e see here ;n< 0alues of records and fields

91 2hen the peek act as copy stageS -1 It is done hen the se=uence file it doesn<t send the data to multiple outputs. In that time the peek act as copy stage. 91 2hat is Stub StageS -1 Stub Stage is a place holder, because in some situations a client re=uires only dropped data. In that time the stub stage acts as a place holder hich holds the output data as temporary, and its sends the re7ected data to the another file. DAY &2 Data>ase Sta es In this stage e ha0e use generally oracle enterprise, 6D$C enterprise, ,ara data ith 6D$C, and dynamic AD$*S and so on.
25.1. Orac!e Enter)rise:

!6racle enterprise is a data base stage, it reads tables from the oracle data base from source to the target" o 6racle enterprise reads multiple tables from, but it loads in the one output.

6racle +nterprise o Properties of 6racle +nterprise'6+(1

Data Set

Navs notes

Page 74

DataStage
Aead *ethod ha0e four options -uto Kenerated XX it generated auto =uery
2010

S9% $uilder XX its ne concept apart comparing from 0C to 0E. ,able XX gi0ing table name here 4ser Defined XX here e are gi0ing user defined S9% =uery.

If e select table option ,able \ !]table nameR"

Connection Pass ord \ ????? 4ser \ Scott Aemote ser0er \ oracle

o 5a0igations for ho the data load to the column ,his is for already data present in plug.in. Select load option in column Koing to the table definitions ,han to plug.in %oading +*P table from their.

If table not in the not their in plug.in. Select load option in column ,hen e go to import Import !meta data definition" o Select related plug.in 6racle 4ser id1 Scott

Navs notes

Page 75

DataStage

Pass ord1 tiger -fter loading select specific table and import.
2010

-fter importing into column, in define e must change hired date data type as !,ime Stamp".

#: - table containing FDD records in that, I need only 1DD fields from thatS A: In read method e use user.defined S9% =uery to sol0e this problem by riting a =uery for reading 1DD records. $ut by the first read method option, e can auto generate the =uery by that e can use by coping the =uery statement in user.defined S9%. #: 2hat e can do hen e don<t kno ho to rite a select commandS A: Selecting in read method \ S9% $uilder -fter selecting S9% $uilder option from read method o 6racle 1Dg o From their dragging hich table you ant o -nd select column or double clicking in the dragged table ,here e can select hat condition e need to get. It is totally automated.

56,+1 in 0ersion C.&./2 e don<t ha0e sa0ing and reusing the properties. Data connection1 its main purpose is reusing the sa0ed properties. #: 3o to reuse the sa0ed propertiesS A: na0igation for ho to sa0e and reuse the properties 6pening the 6+ properties o Select stage Data connection ,here load sa0ed dc

Navs notes

Page 76

DataStage
o 5a0eenJdbc XX it is a sa0ed dc o Sa0e in table definition.
2010

DAY &6 OD:C Enter)rise 6D$C +nterprise is a data base stage -bout 6D$C +nterprise1 6racle needs some plug.ins to connect the DataStage. 2hen DataStage 0ersionC released that time the oracle @i pro0ides some dri0ers to use. 2hen coming to connection oracle enterprise connects directly to oracle data base. $ut 6D$C needs 6S dri0ers to hit oracle or to connect oracle data base.
< a!"e 0nte $ &s e Navs notes <DGC 0nte $ &s e

<R4C/0 DG Page 77 <S

DataStage
Directly hitting

4se 6S dri0ers to hit the oracle db

Difference bet een 6racle +nterprise '6+( and 6D$C +nterprise <0 -e s&on 'e$en'ent Goo' $e ,o %an!e S$e!&,&! to o a!"e 2ses $"ug;&ns No e5e!ts at sou !e <DGC0 -e s&on &n'e$en'ent Poo $e ,o %an!e 9o %u"t&$"e '8 2ses <S ' &ve s Re5e!t at SRC ITRG.

91 3o database connect using 6D$CS

6D$C+ First step1 opening the properties of 6D$C+ Aead method \ table o ,able \ +*P Connection

Data Set

o Data Source \ 23A XX 23A means name of 6D$C dri0er


Navs notes Page 78

2010

DataStage
o Pass ord \ ?????? o 4ser \ Scott Creating of 23A 6D$C dri0er at 6S le0el. o -dministration tools 6D$C -dd o *S 6D$C for 6racle Ki0ing name as 23A Pro0iding user name\ Scott -nd ser0er\ tiger.
2010

6D$C+ dri0er at 6S le0el ha0ing lengthy process to connect, to o0er this 6D$C connector ere introduced. 4sing 6D$C Connector is =uick process as e compare ith 6D$C+. $est Feature by using 6D$C Connector is !Schema reconciliation". ,hat automatically handles data type miss match bet een the source data types and DataStage data types. Differences bet een 6D$C+ and 6D$C Connector.

<DGC0 Conne!to 7t !annot %a#e t6e "&st o, Data Sou !e Na%e EDSNF. 7n t6e <DGC0 Jno test&ng t6e !onne!t&onK. <DGC0 ea' se*uent&a"") an' "oa'

<DGC 7t $ ov&'es t6e "&st 6ave &n <DGC DSN. 7n t6&s .e !an test t6e !onne!t&on 8) test 8utton. 7t ea' $a a""e" an' "oa's $a a""e" Egoo' $e ,o %an!eF.

Navs notes

Page 79

DataStage
Properties of 6D$C Connector1 o Selecting Data Source 5ame DS5 \ 23A
2010

o 4ser name \ Scott o Pass ord \ ????? o S9% =uery &6515 MS E7ce! (ith OD:CE:

First step is to create *S +/cel that is called ! ork book". It<s ha0ing ;n< number of sheets in that. For e/ample C4S, ork book is created 91 3o to read +/cel ork book ith 6D$C+S -1 opening the properties of 6D$C+ Aead method \ table o ,able \ !empl[" XX hen e reading from e/cel name must be in double codes end ith [ symbol.

Connections o DS5 \ +H+ o Pass ord \ ????? o 4ser \ ///// Column o %oad Import 6D$C table definitions
Navs notes

DS5 XX here select ork book 4ser id M pass ord


Page 80

DataStage
o Filter XX enable by click on include system tables o -nd select hich you need M ok
2010

In 6perating System o -dd in 6D$C *S +HC+% dri0ers 5ame \ +H+ XX it is DS5

91 3o do you read +/cel format in Se=uential FileS -1 $y changing the C4S, e/cel format into C4S,.cs0 &65&5 Tara Data (ith OD:CE: ,ara Data is like an oracle cooperation data base, hich use as a data base. 91 3o to read ,ara Data ith 6D$C -1 e must start the ,ara Data connection 'by clicking shortcut(. o -nd in 6S also e must start Start .Rcontrol panel .R-dministrator tools .R ser0ices .R ,ara Data db initiator XX must start here

o -dd DS5 in 6D$C dri0ers Select ,ara data in add list 2e must pro0ide details as sho n belo
-fter these things

4ser id \ tduser Pass ord \ tduser Ser0er 1 12C.D.D.1 e must open the )ro)erties of 6D$C+

o Aead method \ table ,able \ financial.customer

Navs notes

Page 81

DataStage
o Connections Co!u$n o %oad Import ,able definitionsXplug.inXtaradata Ser0er1 12C.D.D.1 4id \ tduser P d \ tduser DS5 \ tduser
2010

4id \ tduser P d \ tduser

-fter all this na0igation at last e 0ie the data, hich e ha0e load in source.

DAY &< D"na$ic 0D:MS and P0OCESSIND STADE &<515 D"na$ic 0D:MS: !It is data base stage8 it is also called as DAS" It supports multiple inputs and multiple outputs

Navs notes

Page 82

DataStage

%nJ+*PJData

Data Set

DAS %nJD+P,JData Data Set It all most common properties of oracle enterprise. Coming to DAS properties o Select db type i.e., oracle o 6racle
o

Scott ,iger

XX for authentication

-t output %nJ+*PJData XX set emp table here -nd %nJD+P,JData XX set dept table here

o Column %oad *eta data for table +*P M D+P,.

In oracle enterprise e can read multiple files, but e can<t load into multiple files. 2e can sol0e this problem ith DAS that e can read multiple files and load in to multiple files.
Navs notes Page 83

2010

DataStage

Some of data base stages1

Nete??a can use in target only to set in input properties.

&<5&5 Processin Sta e: In this 2E processing stages are there, but e use 1D stages generally. -nd the 1D stages are 0ery important. ,hey are, 1. ,ransformer 2. %ook 4P F. Uoin #. Copy &. Funnel I. Aemo0e duplicates C. Slo ly changing dimension E. *odify @. Sort 1D. Surrogate key

&<5,5 Transfor$er Sta e: ,he symbol of ,ransformer Stage is

Navs notes

Page 84

2010

I3a" can use in source only to set in output properties.

DataStage

- simple =uery that e sol0ing by using transformer i.e,


2010

91 calculate the salary and commission of an employee from +*P table.

6racle +nterprise

,ransformer

Data Set

3ere, setting the connection and load *eta data in to column

here, source field and structure a0ailable mapping should be do. ,ransformer Stage is !all in one stage".

Properties of ,ransformer Stage1

o For abo0e =uestion e must create a column to rite description In the do n at output properties clicking in empty position. ,hat column e name as 5+,S-% $y double clicking on the 5+,S-%, e can rite deri0ation here. For e/ample, I5.S-% V I5.C6** XX e can rite by rite clicking their It 0isible in input columnXfunctionX and so on.

-fter that hen e e/ecute the null 0alues records it drops and remaining records it sends to the target. o For this e can functions in deri0ation I5.S-% V 5ull,oWero 'I5.C6**(

o $y this deri0ation e can null 0alues records as target.

Navs notes

Page 85

DataStage

#: 5+,S-%\ S-% V C6** V2DD8 ho to include this logic in deri0ationS A: adding ,3ome column in output properties. In ,3ome deri0ation part e include this logic
o
2010

%ogic1 if 5+,S-% R 2DDD then ,ake3ome \ 5+,S-% G 2DD else ,ake3ome \ 5+,S-%

If 'I5.S-% V 5ull,oWero 'I5.C6**((R 2DDD

,hen 'I5.S-% V 5ull,oWero 'I5.C6**(( G 2DD +lse 'I5.S-% V 5ull,oWero 'I5.C6**( ( V 2DD

o $y this logic it takes more time in huge records, so the best ay to o0er this problem is Stage Qariable. Stage Qariable1 !it is a temporary 0ariable hich ill holds the 0alue until the process completes and hich doesn<t sent to the result to output" Stage 0ariable is sho n in the tool bar of transformer properties. -fter clicking that it 0isible in the input properties In stage 0ariable e must add a column for e/ample, 5S Qariables to adding column 1 5S D integer . # D -fter adding 5S column ,o 5S column including the deri0ation, I5.S-% V 5ull,oWero 'I5.C6**(.
-dding these deri0ations to the input properties to created columns.

o 5+,S-% \ 5S
o

,3ome \ if '5S R 2DDD( then '5S .2DD( else '5S V 2DD(. DAY &= Transfor$er Functions@I

+/amples on Transfor$er Functions:

Navs notes

Page 86

DataStage
1. %eft Function 2. Aight Function #. Concatenate Function &. Field Function I. Constraints Function 'Filter( For e/ample, a ord *I5D94+S,, from that ord e need only 94+. Aight Function using the abo0e for =uestion . A'%'C(,F( %eft Function G %'A'&(,F(
Substring G SS,'&,F(
2010

3. Substring Function

Filter1 DataStage in F different ays 1. Source le0el


2. Stages 'filter, s itch, e/tended filter(

F. Constraints 'transformer, lookup( Constraints1 !In transformer constraints used as filter, means constraints is also called as filter" 91 ho a constraint used in ,ransformerS -1 in transformer properties, e ill see a constraints ro in output link. ,here e can rite the deri0ation by double clicking.

Differences bet een $asic transformer and parallel transformer1


7ts e,,e!ts on $e ,o %an!e. Gas&! T ans,o %e DonLt e,,e!ts on $e ,o %an!e( 8ut &t Pa a""e"on T ans,o %e e,,e!ts !o%$&"e t&%e. Can e>e!ute &n an) $"at,o %.

Gas&! T> !an on") e>e!ute u$ to SMP. Navs notes Gas&! T> !an !a"" t6e Rout&nes .6&!6 &s &n 8as&! an' s6e""

Page 87

7t su$$o ts .&'e ange o, "anguage o %u"t&$"e

DataStage

56,+1 ,/ is 0ery sensiti0e ith respect to Data ,ypes, if an source and target be cannot different data types. 91 3o the belo file can read and perform operation like filtering, separating by using left, right, substring functions and date display like DD.**.TTTTS -1 File.t/t 3I5QC2F#D@CID#&#F212DDED2DFD6% ,PID&I&D & E2I1.@@ ,PID&I&& # 2EI1.I@ ,PID&I&C C I21E.@I 3I5QC12FD#CID#ICI212DDEDFD#+46 ,PID&I#D F &2F#.DD ,PID&I#& 2 CE&&.IC ,PID&I&C @ C#&2.2E 3I5QC#F2D&CIDICIF212DDED#D&+46 ,PID&IFD E 1I&C.&C ,PID&IF& I @&I#.1F ,PID&IFC 1 2F#F.I#

Desi n1 IN1 IN3

Navs notes

Page 88

2010

DataStage
SF ,/1 IN4 ,/2

%UT ,/F ,otal fi0e steps to need to sol0e the gi0en =uestion1 Ste) 1: %oading file.t/t into se=uential file, in the properties of se=uential file loading the hole data into one record. *eans here creating one column called A+C and no need of loading of *eta data for this. Ste) &: I51 ,/. Properties, in this step e are filtering the !3" staring records from the gi0en file. 3ere, e are creating t o columns ,TP+ and D-,-. DS

7N1 REC

I'1

CO'STRAI'T Left (I'1+REC)1)BC%C I'2

De &vat&on

Co"u%n

/e,t E7N1.R0C( 1F 7N1.R0C D4T4

T3P0

Ste) ,1 I52 ,/ properties, here creating four column and separating the data as per created columns.
7N2 T/PE DATA Navs notes I'* /e,t E7N1.R0C( 1F /e,t ER&g6t E7N2.D4T4( 21F( 9F C7D 7N2.D4T4 M20( 8N 7N-CN< Page 89

2010

DataStage
De &vat&on Co"u%n

Ste) .1 I5F ,/ properties, here $I%%JD-,+ column going to change into DD.**.TTTT format using Stage Qariable.
Sta0e 7N3 I' C'O CID #ILL.DA TE CARR
De &vat&on

ariable
Co"u%n

R&g6t E7N3.G7//:D4T0( 2F R&g6t E/e,t E7N3.G7//:D4T0( 6F( 2F /e,t E7N3.G7//:D4T0( 4F

D M 3

OAT
De &vat&on Co"u%n

7N3.7N-CN< 7N-CN< 7N3.C7D C7D DAL;OA MAL;OA 3

Ste) 21 here, setting the output file name for displaying the $I%%JD-,+.

DAY &A Transfor$er Functions@II +/amples on Transfor$er Functions II:

Navs notes

Page 90

2010

DataStage

1. Field Function1 !it separates the fields using delimiter support".


2010

2. ,rim1 !it remo0es all special characters". 3. ,rim $1 !it remo0es all after spaces". 4. ,rim F1 !it remo0es all before spaces". 5. ,rim , M %1 !it remo0es all after and before spaces". 6. Strip 2hite Spaces1 !it remo0es all spaces". 7. Compact 2hite Spaces1 !it remo0es before, after, middle one, spaces".

91 - file.t/t consisting of special character, comma delimiters and spaces 'before, after, and in bet een(. 3o to sol0e by abo0e functions and at last it to be one recordS File.t/t

+ID,+5-*+,S,-,+ 111, 5aQeen, -P ,5

222^, *4n5-, ^FFF, Sra

0an, :5^

###, ^ San DeeP, :5 &&&, an0esh,*3

Desi n1 I51 SF ,/ I52 ,/

Navs notes

Page 91

DataStage

I5F
2010

64, ,/ ,otal Fi0e steps to sol0e the File.t/t using abo0e functions1 Ste) 11 3ere, e/tracting the file.t/t and setting into all data into one record to the ne column created that A+C. no need of load meta data to this. Point to remember keep that first line is column name \ true. Ste) &1I51. ,/ properties In link I51 ha0ing the A+C, that A+C to di0ide into fields by comma delimiter i.e., using field functions.
7N1 REC
De &vat&on

DS

7N2
Co"u%n

9&e"'E7N2.R0C(L(L(1F 9&e"'E7N2.R0C(L(L(2F 9&e"'E7N2.R0C(L(L(3F

07D 0N4M0 ST4T0

Ste) ,1 I52. ,/ properties


3ere, to remo0e special characters, spaces, lo er cases into upper cases by using the

trim, Strip 2hitespaces 'S2S(, 4p case functions.


I'2

I'*
De &vat&on Co"u%n

07D 0N4M0 ST4T0 Navs notes

T &%E7N2.07D(KPK(KKF 07D 2$!aseET &%ES@SE7N2.0N4M0(KPK(KKFF 0N4M0

Page 92

DataStage

Ste) .1 I5F. ,/ properties 3ere, all ro s that di0ided into fields are concatenating means adding all records into one A+C.
I'*

OAT
De &vat&on Co"u%n

07D 0N4M0 ST4T0

7N3.07DA 7N3.0N4M0A 7N3.ST4T0

R0C

Ste) 21 For the output, here assigning a target file. -nd at last the ans er ill display in one record but all special characters, spaces ere remo0ed after doing are implementing the transformer functions to the abo0e file.t/t. Fina! out)ut1 ,rgJfile.ds A+C 1115-Q++5 -P 222 *455-,5 FFFSA-Q-5 :5 ###S-5 D++P:5 &&& -5Q+S3*3

&A515 0e@Structure Sta e1 1. Column +/port 2. Column Import Co!u$n E7)ort1


Navs notes Page 93

2010

DataStage
!it is used to combine the multiple of columns into single column" and it is also like concatenate in the transformer function. o Input o 6utput Co!u$n I$)ort1 !it is used to e/plore from single column into multiple columns" and it is also like field separator in the transformer function. Properties1 o Input o 6utput Import column type \ !0archar" Import output column\ +ID Import output column\ +5-*+ Import output column\ S,-,+ DAY ,E KO: Para$eters 8D"na$ic :indin 9 Column method\ Column ,o Import \ A+C +/port column type \ !0archar" +/port output column \ A+C Column method \ e/plicit Column ,o +/port \ +ID Column ,o +/port \ +5-*+ Column ,o +/port \ S,-,+
2010

Properties1

Navs notes

Page 94

DataStage
Dynamic $inding1 !-fter compiling the 7ob and passing the 0alues during the runtime is kno n as -ssuming one scenario that hen e taking a oracle enterprise, e must pro0ide the table and load its meta data. 3ere table name must be static bind. $ut there is no need for gi0ing the authentication to oracle are to be static bind, because of some security reasons. For this e can use 7ob parameters that can pro0ide 0alues at runtime to authenticate. Uob parameters1 !7ob parameters is a techni=ue that passing 0alues at the runtime, it is also called dynamic binding". Uob parameters are di0ided into t o types, they are o %ocal 0ariables o Klobal Qariable
%ocal 0ariables 'params(1 !it is created by the DS Designer only, it can use
2010

dynamic binding".

ith in the

7ob only".
Klobal Qariables1 !it is also called as en0ironment 0ariables", it is di0ided into t o

types. ,hey are,


o

+/isting1 comes ith in DataStage, in this t o types one general and another one parallel. 4nder parallel compiler, operator specific, reporting ill a0ailable.

4ser Defining1 it is created in the DataStage administrator only.

NOTE: !,he local parameters that created one 7ob they cannot be reused in other 7ob, this is up to 0ersionC. $ut coming to 0ersionE e can reuse them by techni=ue called parameter set". $ut in 0ersionC e can also reuse parameters by 4ser Define 0alues by DataStage -dministrator. 91 3o to gi0e Auntime 0alues using parameters for the follo ing listS
a. ,o gi0e runtime 0alues for user ID, pass ord, and remote ser0erS

Navs notes

Page 95

DataStage
b. Department number 'D56( to keep as constraint and runtime to select list of any number to display itS d. Pro0iding target file name at runtimeS e. Ae.using the global and parameter setS Desi n1
2010

c. -dd $654S to S-% V C6** at runtimeS

6A-C%+ Ste)11

,/

Data Set

!Creating 7ob parameters for gi0en =uestion in local 0ariable". Uob parameters o Parameters 5ame
a 8 !

D5-*+ 4S+A Pass ord S+AQ+A D+P, $654S DAIQ+ F6%D+A ,-AK+,

,ype string +ncrypted String %ist Integer String String String

Default 0alue SC6,, ?????? 6A-C%+ 1D 1DDD C1X AepositoryX dataset.ds

4ID P2D AS D56 $654S IP F6%D+A ,AK FI%+

'

3ere, a, b, c, d are represents a solution for the gi0en =uestion. Ste) &1!Creating global 7ob parameters and parameter set".

Navs notes

Page 96

DataStage
DS -dministrator o Select a pro7ect
2010

Properties Keneral o +n0ironment 0ariables

4ser defined 'there e can rite parameters( Default 0alue SC6,, ?????? 6A-C%+

5ame 4ID P2D AS

D5-*+ 4S+A Pass ord S+AQ+A

,ype string +ncrypted String

3ere, global parameters are preceded by [ symbol. For Ae.use, e must o -dd en0ironment 0ariables 4ser defined

4ID

[4ID

P2D [P2D AS [AS

Ste) ,1 !Creating )ara$eter set for multiple 0alues M pro0iding 4ID and P2D other 0alues for D+Q, PAD, and ,+S,". In local 0ariables 7ob parameters o Select multiple of 0alues by clicking on -nd create parameter set Pro0iding name to the set o S45J6A Sa0ing in ,able definition In table definition

Navs notes

Page 97

DataStage
o +dit S45J6A- 0alues to add 5ame D+Q PAD ,+S, 4ID STS,+* PAD ,+S, P2D ?????? ?????? ?????? S+AQ+A S45 6A-C%+
2010

*665

For re.using this to another 7ob. o -dd parameters set 'in 7ob parameters( ,able definitions 5a0s o S45J6A-'select here to use( 56,+1 !Parameter set use in the 7obs ith in the pro7ect only". Ste) .1 !In oracle enterprise properties selecting the table name and later assign created 7ob parameter as sho n belo ". Properties1 Aead method \ table o ,able \ +*P Connection o Pass ord \ PP2DP o 4ser \ P4IDP o Aemote Ser0er \ PASP Column1 %oad o *eta data for +*P table
Pa a%ete s 7nse t 5o8 $a a%ete s Q27D QP@D va &a8"es QRS S2N:<R4.27D S2N:<R4.P@D $a a%ete set S2N:<R4.RS 27D P@D /o!a" va &a8"es g"o8a" env& on%ent

Navs notes

Page 98

DataStage

Ste) 21
2010

!In ,/ properties dept no using as a constraint and assign bonus to bonus column".
Sta0e 7N EID E'A!E STATE SAL CO!! DEPT'O
De &vat&on

ariable
Co"u%n

7N.S4/ R Nu""ToSe oE7N.C<MMF NS

OAT Constraint; I'+DEPT'O B D'O


De &vat&on Co"u%n

7N.07D 7N.0N4M0 NS NSRG<N2S

07D 0N4M0 N0TS4/ G<N2S

3ere, D56 and $654S are the 7ob parameters e ha0e created abo0e to use here. For that simply right click.R7ob parameters.RD56>$654S 'choose hat you ant( Ste) 61 !,arget file set at runtime, means follo ing belo steps to follo to keep at runtime". Data set properties o ,arget file\ PIPPPF6%D+APP,AKFI%+P 3ere, hen run the 7ob it asks in hat dri0e, and in hich folder. -t last it asks hat target file name you ant.

Navs notes

Page 99

DataStage

DAY ,1 Sort Sta e 8Processin Sta e9 91 2hat is sortingS !3ere sorting means higher than e kno actually". 91 2hy to sort the dataS !,o pro0ide sorted data to some sort stages like 7oin> aggregator> merge> remo0e duplicates for the good performance". T(o t")es of sortin :
1. ,raditional sorting1 !simple sort arranging the data in ascending order or descending
2010

order".
2. Comple/ sorting1 !it is only for sort stages and to create group id, blocking un anted

sorting, and group ise sorting". In DataSta e (e can )erfor$ sortin in three !e/e!s1 Source le0el1 !it can only possible in data base". %ink le0el1 !it can use in traditional sort". Stage le0el1 !it can use in traditional sorting as ell as comple/ sorting". 91 2hat is best le0el to sort hen e consider the performanceS !-t %ink le0el sort is the best e can perform". Source !e/e! sort1 o It can be done in only data base, like oracle enterprise and so on. o 3o it ill be done in 6racle +nterprise '6+(S

Navs notes

Page 100

DataStage
Ko to 6+ properties Lin* !e/e! sort:
o

Select user define S9%


2010

9uery1 se!ect L fro$ EMP order >" DEPTNO5

3ere sorting ill be done in the link stage that is sho n ho in pictorial ay.

o -nd it ill use in traditional sorting only. o %ink sort is best sort in case of performance.

6+ U6I5 DS

91 3o to perform a %ink SortS !3ere as per abo0e design, open the U6I5 properties". -nd go to partitions o Select partition techni=ue 'here default is ;auto<( *ark !perform sort" 2hen e select uni=ue 'it remo0es duplicates( 2hen e select stable 'it displays the stable data(

91 Ket all uni=ue records to target1 and remaining to another target2S !For this e must create group id, it indicates the group identification".

Navs notes

Page 101

DataStage
It is done in a stage called sort stage, in the properties of the sort stage and in the options by keeping create key change column 'C:CC( \ !true", default is false.
2010

3ere e must select to hich column group id you ant. Sort Sta e1

!It is a processing stage, that it can sort the data in traditional sort or in comple/ sort".

Sort Stage Comple/ sort means to create group id, blocking un anted sorting, and group ise sorting in some sort stage like 7oin, merge, aggregate, and remo0e duplicates. ,raditional sort means sorting in ascending order or descending order. Sort Properties1 Input properties o Sorting key \ +ID 'select the column from source table( o :ey mode \ sort 'sort> don<t sort 'pre0iously sorted(> don<t sort 'pre0iously grouped(( o 6ptions Create cluster key change column \ false 'true> false( Create key change column \ 'true> false( 6utput properties o *apping should be done here. ,rue \ enables group id. False \ disables the group id.

Navs notes

Page 102

DataStage

DAY ,& A Transfor$er I Sort sta e 4o> 91 Sort the gi0en file and e/tract the all addresses to one column of a uni=ue record and count of the addresses to ne column. File.t/t

+ID, +5-*+, -CC,TP+ 111, munna, sa0ings FFF, na0een, loans 222, kumar, credit 111,munna, current 222, kumar, loans 111, munna, insurance FFF, na0een, current 111, munna, loans 222, kumar, sa0ings

Design1

SF

Sort1

DS

Navs notes

Page 103

2010

DataStage

Se1uentia! Fi!e 8SF9: here reads the file.t/t for the process. Sort11 here sorting key \ +ID

-nd enables the C:CC for group id.

Transfor$er 8TM9: here logic to implement operation for target. o

Properties of ,H1
Sta0e ariable
Co"u%n

7N2 EID E'A!E ACCT/P E De1C2an

De &vat&on

&, E7N2.#e)!6ange + 1F t6en 7N2.4CCT3P0 ,un!1 e"se ,un!1 AL(LA 7N2.4CCT3P0 &,E7N2.#e)!6ange+1F t6en 1 e"se !R1

OAT
De &vat&on Co"u%n

7N2.07D 07D 7N3.0N4M0 ,un!1 4CCT3P0

0N4M0

For this logic output ill displays like belo 1 +ID, +5-*+, -CC,TP+ 111, munna, sa0ings 111,munna, sa0ings, current 111, munna, sa0ings, current, insurance 111, munna, sa0ings, current, insurance, loans 222, kumar, credit 222, kumar, credit ,loans 222, kumar, credit ,loans, sa0ings FFF, na0een, current FFF, na0een, current, loans C645, 1 2 F # 1 2 F 1 2

Navs notes

Page 104

2010

,/

Sort2

DataStage

Sort&1 o

3ere, in the properties e must set as belo . Stage :ey\-CC,TP+


o o

Sort key mode \ sort Sort order \ Descending order

Input Partition type1 hash Sorting o Perform sort Stable 'uncheck( 4ni=ue 'check this(

o Selected 6utput

:ey\ count 4sage\ sorting, partitioning 6ptions\ ascending, case sensiti0e

*apping should be doing here.

Data Set 8DS91

o Input1 partition type1 hash

o Sorting1

Navs notes

Page 105

2010

DataStage

Perform sort Stable 'check this( +ID, +5-*+, -CC,TP+, C645, # F 2 DAY ,, FILTE0 STADE
2010

4ni=ue 'check this(

Final output1

o Selected :ey\ +ID 4sage\ sorting, partition -scending

111, munna, sa0, curr, insu, loans 222, kumar, credit ,loans, sa0 FFF, na0een, current, loans

Fi!ter means !blocking the un anted data". In DataStage Filter stage can perform in three le0el, they are 1. Source le0el 2. Stage le0el F. Constraints
Source %e0el Filter1 !it can be done in data base and as o

ell as in file at source le0el".

Data $ase1 by rite filter =uires like !select ? from +*P here D+P,56 \ 1D".

o Source File1 here e ha0e option called filter there e can rite filter commands like !grep !moon"> grep GI !moon"> grep G !moon" ".
Stage Filter1 o

!Stage filters use in three stages, and they are 1. Filter, 2. S itch and F. +/ternal filter".

o Difference bet een if and s itch1

Poo $e 79 ,o %an!e. 79 !an . &te OnL nu%8e o, !o"u%n &n !on'&t&on. 7t 6ave OnL nu%8e o, !ases.

Gette S@7TCH $e ,o %an!e t6an 79.

Navs notes

S@7TCH !an on") one Page 106 !on'&t&on !an $e ,o %. 7t !an on") 6ave 128 !ases.

DataStage

3ere filter is like an IF, s itch as s itch.

o Differences bet een three filter stages.


97/T0R 97/T0R Con'&t&on on %u"t&$"e !o"u%ns. 7t 6ave( o 1 T &n$ut n T out$uts 1 T e5e!t S@7TCH Con'&t&on on s&ng"e !o"u%n. 7t 6ave( o 1 T &n$ut 128 T out$uts 1 ; 'e,au"t 0BT0RN4/ 7t &s us&ng 8) t6e GR0P !o%%an's. 7t 6ave( o 1 T &n$ut 1 T out$ut no e5e!ts

Fi!ter sta e: !it ha0ing one input, n outputs, and one re7ect link". ,he symbol of filter is

Filter 91 3o the filter stage to send the data from source to targetS Design1 DS
Navs notes T 1 Page 107

2010

DataStage

6+
Re5e!t T 2

DS

DS Step11
Connecting to the oracle for e/tracting the +*P table from it.

Step21 Filter properties Predicates o 2here clauses \ D+P, 56 \1D 6utput link \1

o 2here clauses \ S-% R 1DDD and S-% ] FDDD 6utput link \ 2

o 6utput re7ects \ true >> it is for output re7ect data. %ink ordering o 6rder of the follo ing output links 6utput1 o *apping should be done for links of the targets e ha0e. StepF1
!-ssigning a target files names in the target".

3ere, *apping for ,1 and ,2 should be done separately for both.

Navs notes

Page 108

2010

Filter

DataStage
It ha0e no re7ect link, e must con0ert a link as re7ect link. $ecause it has ;n< number of outputs.
2010

DAY ,. Ko>s on Fi!ter and )ro)erties of S(itch sta e -ssignment Uob 11 a. 6nly D+P,56 1D to target1S b. Condition S-%R1DDD and S-%]FDDD satisfied records to target2S c. 6nly D+P,56 2D here clause \ S-%]1DDD and S-%RFDDD to targetFS d. Ae7ect data to target#S Design to the U6$11
T

Filter

+*PJ,$%

Filter

Navs notes

Page 109

DataStage

T
2010

Ste)1: !For target11 In filter here clause for target1 is D+P,56\1D and link order\D". Ste)&: !For target21 here clause \ S-%R1DDD and S-%]FDDD and link order\1". Ste),: !For targetF1 here clause\ D+P,56\2D and link order\D". Ste).: !For target#1 con0ert link into re7ect link and output re7ect link\true". Uob 21 a. -ll records from source to target1S b. 6nly D+P,56\FD to target2S
!. 2here clause \ S-%]1DDD and S-%RFDDD to targetFS

d. Ae7ect data to target#S Design to the U6$21


T

Copy

+*PJ,$%
T

Filter
T

Navs notes

Page 110

DataStage
Ste)1: !For target1 mapping should be done output links for this". Ste)&: !For target2 here clause \ D+P,56\FD and link order \D". Ste),: !For targetF here clause \ S-%]1DDD and S-%RFDDD and link order\1". Ste).: !For target# con0ert link into re7ect link and output re7ect link\true".

Uob F1 a. -ll uni=ue records of D+P,56 to target1S b. -ll duplicates records of D+P,56 to target2S c. -ll records to targetFS d. 6nly D+P,56 1D records to target#S e. Condition S-%R1DDD M S-%]FDDD, but no D+P,56\1D to target&S Design to the U6$F1
=+ T

Filter +*PJ,$%

=+ T

TT

Navs notes

Page 111

2010

DataStage
Filter

Ste)1: !For target11 here clause \ keychange\1 and link order\D". Ste)&: !For target21 here clause \ keychange\D and link order\1". Ste),: !For targetF1 mapping should be done output links for this". Ste).: !For target#1 here clause\ D+P,56\1D". Ste)2: !For target&1 in filter properties put output ro s only once\ true for here clause S-%R1DDD M S-%]FDDD". S3ITC' Sta e: !Condition on single column and it has only 1 G input, 12E G outputs and 1. default". Picture of s itch stage1

Properties of S itch stage1 Input o Selector column \ D+P,56


Cases o

0alues Case \ 1D \ D link order

o Case \ 2D \ 1 6ptions
Navs notes Page 112

2010

DataStage
o If no found \ options 'Drop> fail> output( Drop\ drops the data and continue the process.
2010

Fail\ if any records drops 7ob aborts. 6utput\ to 0ie re7ect data through the link.

DAY ,2 E7terna! Fi!ter and Co$>inin E7terna! Fi!ter: FIt is processes stage, hich can perform filter by 45IH commandsG5 It ha0ing 1.input, 1.output, and 1.re7ect link. ,o perform a te/t file, first it must read in single record in the input. +/ample filter command1 grep !ne york".

Se=uential File +/ternal Filter properties1

+/ternal Filter

Data Set

o Filter command \ grep !ne york" o Krep G0 !ne york" XX other than ne it filters. Co$>inin : !in DataStage combining can done in three types". ,hey are

Navs notes

Page 113

DataStage
o 3ori)ontal combining o Qertical combining
2010

o Funneling combining

3ori)ontal combining1 combining primary ro s ith secondary ro s based on primary key. o ,his stage that perform by U6I5, %66:4P, and *+AK+. ,hese three stages differs ith each other ith respect to, o Inputs re=uirements, o ,reatment of unmatched records, and o *emory usage. DAY ,6 'ori?onta! Co$>inin 8'C9 and Descri)tion of 'C sta es 'ori?onta! Co$>inin 8'C9: Fcombining the primary ro s ith secondary ro s based on primary keyG5 Selection of primary table is situation based.

E'O E'a&e D'o 111 10 222 D'o LOC 10 20 40 naveen %unna D'a&e 7T S0 S4 H3D S0C DN< DN4M0 /<C 0N< 0N4M0

' C

3ere e can combine

Navs notes

Page 114

DataStage
Inner 7oin, Left outer 7oin, fu!! outer 7oin If ,1\ Y1D, 2D, FDZ and ,2\ Y1D, 2D, #DZ Inner Uoin1 !*atched primary and secondary records".
,1 ,2
2010

0i ht outer 7oin, and

%eft 6uter Uoin: !*atched primary M secondary and unmatched primary records".
,1 ',1 ,2(

Aight 6uter Uoin: !*atched primary M secondary and unmatched secondary records".
,2 ',1 ,2(

Full 6uter Uoin: !*atched primary M secondary and unmatched primary M unmatched secondary records".
,1 ,2

Descri)tion of 'C sta es: !,he description of hori)ontal combining is di0ided into nine parts". ,hey are, o Input names, o Input output re7ects, o Uoin types,
o

In)ut re1uire$ents ith respect to sorting,

o De G duplication 'remo0ing duplicates(,


o o

Treat$ent of un$atched records, Me$or" usa e,

o :ey column names, and o ,ypes of inner 7oin.

Navs notes

Page 115

DataStage
,he differences bet een 7oin, lookup, and merge

ith respect to abo0e nine points are

sho n belo .
?OI' !ER(E 7n$ut na%esA @6en .e .o # on HC .&t6 1<7N t6e ,& st SRC &s left ta8"e( an' "ast SRC &s ri02t ta8"e. 4n' a"" %&''"e SRCLs a e inter&e$iate ta8"es. 7n$ut out$ut e5e!tsA ' F inp-ts E&nne ( /<1( R<1F 2 F inp-ts E9<1F 1 F o-tp-t( an' 1 F LOODAP T6e ,& st "&n# , o% sou !e &s pri&ar1E inp-t an' e%a&n&ng "&n#s a e loo3-pE references "&n#s. T6e ,& st ta8"e &s &aster table an' e%a&n&ng ta8"es a e -p$ates ta8"es.
2010

' F Inp-ts Eno %a"F 2 F inp-ts Es$a seF 1 F o-tp-t( an' 1 F re<ect

' F inp-ts 1 F o-tp-t (n F 1) re<ects+

1o&n T)$esA Inner <oin) left o-ter <oin) ri02t o-ter <oin( an' f-ll o-ter <oin+ Inner ?oin Left o-ter <oin Inner <oin Left o-ter <oin

;; Inp-t Re9-ire&ents .&t6 es$e!t to so t&ng;; Pri&ar1; &an$ator1 Optional Optional !an$ator1 !an$ator1

Secon$ar1; ;;De F D-plication E e%ov&ng t6e 'u$"&!atesF;; Pri&ar1; OD Enot6&ng 6a$$ensF OD "arnin0s OD

"arnin0s Secon$ar1; OD ;; Treat&ent of An&atc2e$ Recor$s;; Pri&ar1; Drop E&nne F Tar0et E/e,tF Drop) Tar0et E!ont&nueF) re<ect Eun%at!6e' $ &%a ) e!o 'sF Drop

Drop) tar0et E#ee$F Drop Re<ect Page 116 Eun%at!6e' se!on'a )

Navs notes Secon$ar1; Drop E&nne F

DataStage

;; !E!OR/ ASA(E;; Li02t &e&or1 ;; De1 Col-&n 'a&es;; Must 8e SA!E ;; T1pe of Inner ?oin ;; ALL ALL A'/ Optional Sa&e &n !ase o, "oo#u$ ,&"e set Must 8e SA!E %ea71 &e&or1 Li02t &e&or1

DAY ,< LOOHUP sta e 'Processer Stage( %ookup stage1 In real time pro7ects, @&B of hori)ontal combining is used by this stage.
!%ook up stage is for cross 0erification of primary records

ith secondary records".

DataStage 0ersionE supports four types of %66:4P, they are o 5ormal %66:4P o Sparse %66:4P o Aange %66:4P o Case less %66:4P For e/ample in simple 7ob ith +*P and D+P, tables1 Primary table as +*P ith column consisting of +ID, +5-*+, D56 Aeference table as D+P, ith column consisting of D56, D5-*+, %6C

Navs notes

Page 117

2010

DataStage

D+P, table 'reference> lookup(

+*P table 'Primary> input(

%66:4P

Data Set 'target(

%66:4P properties for t o tables1


Pri&ar1 Table E'O E'A! E D'O Tar0et E'O E'A! E D'A! Reference Table D'O D'A! E LOC

:ey column for both tables It can set by 7ust drag from primary table to reference table to D56 column.

Navs notes

Page 118

2010

DataStage
In tool bar of %66:4P stage consists of constraints button, in that e ha0e to select
Continue1 this option for %eft 6uter Uoin.
2010

Drop1 it is to Inner Uoin. Fail1 its aborts 7ob, if a primary unmatched records are their. Ae7ect1 it<s captured the primary unmatched records.

Case less %66:4P1 In e/ecution by default it acts as a case sensiti0e. $ut e ha0e a option to remo0e the case sensiti0e i.e.,
o

:ey type \ case less.

DAY ,= S)arse and 0an e LOOHUP S)arse LOOHUP: If the source is database, its supports only t o inputs. 5ormal lookup1 !is cross 0erification of primary records ith secondary at memory". Sparse lookup1 !is cross 0erification of primary records ith secondary at source le0el itself". ,o set sparse lookup e must ad7ust key type as sparse in reference table only. $y default 5ormal %66:4P is done in lookup stage. 5ote1 sparse lookup not support another reference hen it is database. $ut in 65+ Case sparse %66:4P stage can supports ;n< references. $y taking loo*up file set

Navs notes

Page 119

DataStage

Uob11 a se=uential file e/tracting a te/t file to load into lookup file set 'lfs(.
2010

Se=uential file 3ere in lookup file set properties1

%ookup file set

o Column names should same as in se=uential file. o ,arget file stored in .lfs e/tension. o -ddress of the target must sa0e to use in another 7ob.

Uob21 in this 7ob e are using lookup file set as sparse lookup. %FS OOOOOOOO %FS

SF

%66:4P

DS

In lookup file set, e must paste the address of the abo0e lfs. %ookup file supports ;n< references means indirectly sparse supports ;n< references.

Navs notes

Page 120

DataStage

!Aange lookup is keeping condition in bet een the tables". 3o to set the range lookup1 In %66:4P properties1 Select the check bo/ for column you need to condition.

Condition for LOOHUP sta e: 3o to rite a condition in the lookup stageS o Ko to tool bar constraint, there e ill see condition bo/. o In condition, for e/ample1 in.primary\ !-P"
o

For multiple links e can rite multiple conditions for ;n< references. DAY ,A Funne!J Co)" and Modif" sta es

Funne! Sta e: !It is a processing stage hich performs combining of multiple sources to a target". ,o perform the funnel stage some conditions must to follo 1
1. Columns should be same 2. Columns names also should be same

F. Columns names should be case sensiti0e #. Data type should be same 5unnel stage it is process to append the records one table after the one, but abo0e four conditions has to be meet.

Navs notes

Page 121

2010

0an e LOOHUP:

DataStage
7n t6&s stage t6e !o"u%n G0N M 6as to e>!6ange &nto 1 an' 9+0U
2010

Simple e/ample for funnel stage1


E'O E' (E' 111 H3D 222 naveen M %unna Loc T B

E'O (E' Co$) HMo'& ,)

E' ADD

E!PID E'a&e Loc Co&pan1 (E' 444 555 7T S4

Co-ntr1 1 0 7n t6&s !o"u%n na%es 6as !6ange as $ &%a ) ta8"e.

D0/ 7ND74 7GM N3 2S4 7GM

Funnel operation three modes1


Continues funnel1 it<s random. Se=uence1 collection of records is based on link order. Sort funnel1 it<s based on key column 0alues.

Co)" Sta e: !It is processing stage hich can be used from". 1. Copying source data to multiple targets. 2. Charge the column names. F. Drop the columns. #. Stub stage. 56,+1 best for change column names and drop columns.

Navs notes

Page 122

DataStage

Modif" Sta e: 1. Drop the columns. 2. :eep the columns. F. Change the column names. #. *odify the data types. &. -lter the data.
2010

!It is processing stage hich can perform".

6racle +nterprise

*odify

Data Set

From 6+ using modify stage send data into data set ith respect to abo0e fi0e points.

In $odif" properties1
Specification1 drop S-%, *KA, D+P,56 o

3ere drops the abo0e columns.

Specification1 keep S-%, *KA, D+P,56 o

3ere accept the columns, remaining columns ere drops. the operation process(

6t runtime7 Data Set *anagement '0ie

Specification1 ]ne

column nameR D6U\3IA+D-,+]old columnR

o 3ere to change column name.

Navs notes

Page 123

DataStage
Specification1 ]ne

column nameRD6U\D-,+JFA6*J,I*+S,-*P'3IA+D-,+(

]old columnR
2010

o 3ere changing the column name ith data type.

DAY .E KOIN Sta e 8)rocessin sta e9 Koin sta e it used in hori)ontal combining ith respect to input re=uirements, treatment of unmatched records, and memory usage.
Uoin stage in)ut na$es are left table, right table, and intermediate tables. Uoin stage ha0ing n ; in)uts 'inner, %6U, A6U(, & ; in)uts 'F6U(, 1@ out)ut, no

re4ect.
T")es of Uoin stage are inner, left outer 7oin, right outer 7oin, and full outer 7oin. Input re=uirements

ith respect to sortin 1 it is mandatory in primary and secondary

tables.

Navs notes

Page 124

DataStage
Input re=uirements

ith respect to de ; du)!ication1 nothing happens means it<s OH hen the option Inner its simple

hen de G duplication. drops and hen it is %6U ill keep all records in target. -nd in secondary table in Inner option it<s drops and it A6U ill keep all records in target.
*emory usage1 !i ht $e$or" in 7oin stage.
2010

,reatment of un$atched records: in primary table

:ey column names should be S-*+ in this stage. -ll types of inner 7oin ill supports. - simple 7ob for U6I5 Stage1

U6I5 properties1 5eed a key column


o

Inner U6I5, %eft 6uter U6I5 comes in left table.

o Aight 6uter U6I5 comes in right table. o Full 6uter U6I5 comes both tables, in this no scope from third table that<s hy F6U ha0e t o inputs. In 7oin stage hen e sort ith different key column names, that 7ob can e/ecutes but its effect on the performance 'simply say 2-A5I5KS ill occurs(

Navs notes

Page 125

DataStage
2e can change the column name by t o types Copy stage and ith =uery statement. +/ample of S9% =uery1 select D+P,561 as D+P,56, D5, and %oc from D+P,8
2010

DAY .1 ME0DE Sta e 8)rocessin sta e9 Mer e sta e is a processing stage it perform hori)ontal combining ith respect to input re=uirements, treatment of unmatched records, and memory usage.
*erge stage input names are $aster and u)dates. N ; in)uts, 1 ; out)ut, and 8n ; 19 re4ects for merge stage. Uoin types of this stage are inner 7oin, and !eft outer 7oin. Input re=uirements

ith respect to sorting is $andator" to sort before perform merge

stage.
Navs notes Page 126

DataStage
Input re=uirements

ith respect to de G duplication in the primary table it ill get

(arnin s hen e don<t remo0e the duplicates in primary table. -nd in secondary
,reatment of unmatched records in primary table Dro) 'drops(, Tar et 'keep( the
2010

table nothing ill happens its OH hen e don<t remo0e the duplicates.

unmatched records of the unmatched primary table records. -nd in secondary table drops and re4ect it ca)tures the unmatched secondary table records.
In the merge stage the memory usage is LID'T memory. ,he key column names must be the SAME. In type of inner 7oin it compares in ANY update tables.

56,+1
Static information stores in the master table.

-ll changes information stores in the update tables. *erge operates ith only t o options o :eep 'left outer 7oin( o Drop 'inner Uoin(

Simple 7ob for *+AK+ stage1

PID PRD.DESC PRD.!A'F 11 in$ica tata 22 sGift &ar-t2i ** ci7ic

PID PRD.SAPP PRD.CAT 11 abc @@@ ** H1I @@@ JJ p9r @@@ KK &no @@@

PID PRD.A(E PRD.PRICE 11 L 1000 22 9 1200 66 3 1500 88 9 1020

*aster ,able *aster table

4pdate '41(

4pdate '42(

Navs notes

Page 127

DataStage

,AK
2010

41

42 or Ae7ect '41( In *+AK+ properties1


*erge ha0e inbuilt sort \ '-scending 6rder>Descending 6rder(

Ae7ect '42(

*ust to follo link order.


*erge supports 'n.1( re7ect links. 56,+1 there has to be same number of re7ect links as update links or )ero re7ect links.

3ere C6PT stage is acting as S,4$ Stage means holding the data ith out sending the data into the target. DAY .& 0e$o/e Du)!icates M A 0e$o/e Du)!icates: !It is a processing stage hich remo0es the duplicates from a column and retains the first or last duplicate ro s". re ator Sta es

Se=uential File

Aemo0e Duplicates

Data Set

Navs notes

Page 128

DataStage
Properties of Remove Duplicates1 , o options in this stage.
2010

o :ey column\ ]column nameR o Dup to retain\'first>last( Aemo0e Duplicates stage supports 1 ; in)ut and 1 ; out)ut5 56,+1 for e0ery n G input and n G output stages should must done $a))in . A re ator:

!It is a processing stage that performs count of ro"s and different calculation bet een columns i.e. rou) >" same operation in oracle".

SF Properties of -ggregator1 Krouping keys1 o Kroup\ Deptno -ggregator

-ggregator

DS

o -ggregator type \ count ro s 'count ro s> calculation> re G calculation( o Count output column\ count ]column nameR 191 Count the number of all records and deptno ise in a +*P tableS 1 Design1

6+J+*P

Copy of +*P

Counting ro s of deptno

,AK1

Navs notes

Page 129

DataStage

Kenerating a column

counting ro s of created column

,AK2

For doing some rou) ca!cu!ation bet een columns1 +/ample1 Select group key Kroup\ D+P,56 . -ggregation type \ calculation . Column for calculation \ S-% ]column nameR 6perations are *a/imum 0alue output column \ ma/ ]ne column nameR *inimum 0alue output column \ min ]ne column nameR
Sum of column \ sum ]ne

column nameR and so on.

3ere, doing calculation on S-% based on D+P,568 29 In ,arget one dept no ise to find ma/imum, minimum, and sum of ro s, and in target t o company ise ma/imumS 2 Design1
6+Jemp copy of emp ma/, min, sum of deptno trg1

Company1 I$*

ma/ of I$*

trg2

F91

,o find ma/ salary from emp table of a company and find all the details of thatS

Navs notes

Page 130

2010

DataStage
M #91 ,o find ma/, min, sum of salary of a deptno ise in a emp tableS
dummy dno\1D

F M # Design1
compare emp

ma/'deptno( 45I65 -%% di0ing

dno\2D

compare copy min'deptno(

dummy

dno\FD

company1 I$*

compare ma/imum S-% ith his details

ma/ 'I$*(

DAY ., S!o(!" Chan in Di$ensions 8SCD9 Sta e $efore SCD e must understand1 types of loading 1. Initial load 2. Incremental load
Initial load1 complete dump in dimensions or data

arehouse i.e., target also ,efore

data is called Initial load.


,he subse=uent is alter is called incremental load i.e., coming from 6%,P also source

is after data.

Navs notes

Page 131

2010

DataStage
+/ample1 P1 $efore data 'already data in a table( CID 11 CNAME ADD 3TD DEN * :ALANCE Phone No FDDDD @EE&F1DIE E -fter data 'update n insert at source le0el data( CID 11 CNAME ADD S+C DEN * :ALANCE Phone No IDDDD @EE&EI&#2 2 Column fields that ha0e changes types1 Address G slo ly change :a!ance G rapid change Phone No G often change A e G fre=uently ADE 2&

ADE 2#

+/ample1 P2 $efore Data1 CID 11 && ,, CNAME $ C ADD 3TD S+C D+%

-fter Data1 'update ;n< insert option loading a table( CID 11 && CNAME $ ADD 3TD C4%

Navs notes

Page 132

2010

DataStage
,, D P45

2e ha0e SIH ,ypes of SCD<s are there, they are SCD G I SCD G II SCD G III SCD G IQ or Q SCD G QI +/planation1 SCD ; I1 e/ecution. SCD ; II: !it maintains both current update data and historical data". 2ith some special !it only maintains current update, and no historical data ere organi)ed". -s per SCD G I, it updates the before data ith after data and no history present after the

operation columns they are, surrogate key, acti0e flag, effect start date, and effect end date8
In SCD G II, not ha0ing primary key that need system generated primary key, i.e.,

surrogate key. 3ere surrogate key acting as a primary key.


-nd

hen SCD G II performs e get a practical problem is to identify old and current concepts are introduced here i.e., effect start date '+SD-,+( and hen the +SD-,+ and ++D-,+ here not able to

record. ,hat e can sol0e by acti0e flag1 !T" or !5".


In SCD G II, ne

effect end date '++D-,+(.


Aecord 0ersion1 it is concept that

use is some conditions.


4ni=ue key1 the uni=ue key is done by comparing.

SCD ; III:

SCD G I 'N( SCD G II !maintain the history but no duplicates".

Navs notes

Page 133

2010

+/tracting after and before data from D2 'or( database to compare and upsert.

DataStage

SCD ; I% or %1

SCD G II N record 0ersion


2010

!2hen e not maintain date 0ersion then the record 0ersion useful". SCD ; %I1 SCD G I N uni=ue identification.

+/ample table of SCD data1 SID 1 & , . 2 6 < = CID 11 && ,, && .. 11 && 22 CNAME A : C : D A : E ADD 'YD SEC DEL DEL MCI DDH 0AK CUL AF N N Y N Y Y Y Y ESDATE E,@E6@E6 E,@E6@E6 E,@E6@E6 E=@EA@E< E=@EA@E< ,E@11@1E ,E@11@1E ,E@11@1E EEDATE &A@11@1E E<@EA@E< AAAA@1&@,1 &A@11@1E AAAA@1&@,1 AAAA@1&@,1 AAAA@1&@,1 AAAA@1&@,1 0% 1 1 1 & 1 & , 1 UID 1 & , & 2 1 & =

,able1 this table is describing the SCD si/ types and the description is sho n abo0e. DAY .. SCD I I SCD II 8Desi n and Pro)erties9

SCD ; I: ,ype1 'Design and Properties(1 ,ransfer 7ob 1D,2D,FD


6+JDI* before fact DSJF-C, 1D, 2D, #D 1D, 2D, #D

%oad 7ob

DSJ,AKJDI* 1D, 2D, #D -fter dim

6+J4PS+A,

1D,2D, #D
DSJ,AKJDI*

.update and insert

6+JSAC

Navs notes

Page 134

DataStage
In oracle e ha0e to create table1 and table2, ,able11 o Insert into src 0alues'111, ;na0een<(8 o Insert into src 0alues'222, ;munna<(8 o Insert into src 0alues'FFF, ;kumar<(8 ,able21
Create table DIM'S:ID number, S56 number, S5-*+ 0archar2'2&((8
2010
#EFORE

Create table S0C'S56 number, S5-*+ 0archar2'2&((8

o 5o records to display8 Processes of transform 7ob SCD11 Step 11 %oad plug.in *eta data from oracle of before and after data as sho n in the abo0e links that coming from different sources. Step 21 !SCD1 properties" Fast )ath 1 of 21 Fast )ath & of 21 select output link as1
,a!t

na0igating the key column 0alue bet een before and after tables

AFTER

SN< SN4M0

DE/ E@PR

COLA!' ' PARPOSE S=7D su ogate #e) 49T0R.SN< SN< 8us&ness #e)

Fast )ath , of 21

selecting source type and source name. source name1 DACstu')CnavsCe%$t).t>t


Page 135

Source type1 9"at ,&"e


Navs notes

DataStage

56,+1 for e0ery time of running the program e should empty the source name i.e.,
2010

empty.t/t, else surrogate key ill continue ith last stored 0alue. Fast )ath . of 21 select output in DI*.

AFTER

DI!

SN< SN4M0

Deri7ation COLA!' ' PARPOSE ne>t s#EF S=7D su ogate #e) 49T0R.SN< SN< 8us&ness #e)

For path & of &1

setting the output paths to F-C, data set.

AFTER

FACT

SN< SN4M0

Deri7ation COLA!' ' G09<R0.S=7D S=7D 49T0R.SN< SN<

#EFORE

S=7D SN< SN4M0

Step F1 In the 5e/t 7ob, i.e. in load 7ob if e change or edit in the source table and hen you are loading into oracle e must change the rite method \ upsert in that e ha0e t o options they are, .update n insert XX if key column 0alue is already.

Navs notes

Page 136

DataStage
.insert n update XX if key column 0alue is ne .

:efore ta>!e
CID C'A!E SDID 10 a8! 1 20 >)? 2 30 $* 3

Tar et Di$ensiona! ta>!e of SCD I


CID 10 20 40 C'A!E SDID a8! 1 nav 2 $* 3

After ta>!e
CID C'A!E 10 a8! 20 nav 40 $*

SCD ; II: 'Design and Properties(1 ,ransfer 7ob


1D,2D,FD before 6+JDI* fact DSJF-C, 1D, 2D, 2D, FD, #D 1D, 2D, 2D, FD, #D

%oad 7ob

DSJ,AKJDI* 1D, 2D, #D -fter dim 1D, 2D, 2D, FD, #D

6+J4PS+A,

.update and insert

6+JSAC

DSJ,AKJDI*

Step 11 in transformer stage1

Navs notes

Page 137

2010

3ere SCD I result is for the belo input

DataStage
-dding some columns to the to before table G to co0ert ++D-,+ and +SD-,+ columns into time stamp transformer stage to perform SCD II In ,H properties1
#EFORE #EFORE.T@

S=7D SN< SN4M0 0SD4T0 00D4T0 4C9

Deri7ation 'A! G09<R0.S=7D S=7D G09<R0.SN< G09<R0.SN4M0 SN4M0

COLA!'

SN<

In SCD II properties1 Fast )ath 1 of 21 select output link as1


,a!t

Fast )ath & of 21

na0igating the key column 0alue bet een before and after tables
#EFORE

AFTER

DE/ E@PR

SN< SN4M0

COLA!' ' PARPOSE S=7D su ogate #e) 49T0R.SN< SN< 8us&ness #e) SN4M0 T)$e2 0SD4T0 e>$e & 'ate Page 138

Navs notes

2010

DataStage
Fast )ath , of 21 selecting source type and source name. source name1 DACstu')CnavsCe%$t).t>t

56,+1 for e0ery time of running the program e should empty the source name i.e., empty.t/t, else surrogate key ill continue ith last stored 0alue. Fast )ath . of 21
AFTER

select output in DI*.


DI!

SN< SN4M0

Deri7ation COLA!' ' PARPOSE EHpires ne>t s#EF S=7D su ogate #e) ; 49T0R.SN< SN< 8us&ness #e) ; 49T0R.SN4M0 SN4M0 T)$e2 ; !u 'ateEF 0SD4T0 e>$e & 'ate ;

Date from Uulian 'Uulian day from day 'current date '(( G 1( For path & of &1 setting the output paths to F-C, data set.

AFTER

SN< SN4M0

FACT

Deri7ation COLA!' 'A!E G09<R0.S=7D S=7D 49T0R.SN< SN< 49T0R.SN4M0 SN4M0 G09<R0.0SD 0SD4T0

#EFORE

S=7D SN< SN4M0 0SD4T0 00D4T0 4C9

Navs notes

Page 139

2010

Source type1 9"at ,&"e

DataStage

Step F1 In the 5e/t 7ob, i.e. in load 7ob if e change or edit in the source table and hen you they are, .update n insert .insert n update XX if key column 0alue is already. XX if key column 0alue is ne .
2010

are loading into oracle e must change the rite method \ upsert in that e ha0e t o options

3ere SCD II result is for the belo input :efore ta>!e


CID C'A!E SDID ESDATE EEDATE ACF 10 a8! 1 01;10;08 99;12; 31 3 20 >)? 20 01;10;08

Tar et Di$ensiona! ta>!e of SCD II


CID C'A!E SDID ESDATE EEDATE ACF 10 a8! 1 01;10;08 99;12; 31 3 20 >)? 2 01;10;08 09;12;10 N 20 >)? 4 10;12;10

After ta>!e
CID C'A!E 10 a8! 20 nav 40

DAY .2 Chan e Ca)ture, Chan e A))!" M Surro ate He" sta es

Chan e Ca)ture Sta e: !It is processing stage, that it capture hether a record from table is copy or edited or insert or to delete by keeping the code column name". Simple e/ample of change capture1

Navs notes

Page 140

DataStage
ChangeJcapture

Properties of Change Capture1 Change keys o :ey \ +ID 'key column name( Change 0al0es o Qalues \S XX +5-*+ o Qalues \S XX -DD 6ptions o Change mode \ 'e/plicit keys M 0alues > e/plicit keys, 0alues( o Drop output for copy \ 'false> true( !false G default " o Drop output for delete \ 'false> true( !false G default" o Drop output for edit \ 'false> true( !false G default" o Drop output for insert \ 'false> true( !false G default"

Sort order \ ascending order

Copy code \ D Delete code \ 2 +dit code \ F Insert code \ 1 Code column name \ ]column nameR

o %og statistics \ 'false> true( !false G default" Chan e A))!" Sta e: !It is processing stage, that it applies the changes of records of a table".

Navs notes

Page 141

2010

DataStage

Change -pply Properties of Change -pply1 Change keys o :ey \ +ID 6ptions o Change mode \ e/plicit key M 0alues o Check 0alue columns on delete \ 'false> true( !true . default" o %og statistics \ false o Code column name \ ]column nameR XX change capture and this has to be S-*+ for apply operations Sort order \ ascending order

SCD II in /ersion <5257& Design of that


+SD-,+\current date '( ++D-,+\ !@@@@.12.F1" :ey\+ID -CF\ !T"

.option1 e k M 0
$efore.t/t

c\F c\all

after.t/t key\ +ID

.option1 e k M 0 Navs notes Page 142

2010

DataStage
before.t/t +SD-,+. current date '( ++D-,+. if c\F then DFUD'UDFD'CD'((.1( else ++D-,+ \ !@@@@.12.F1" -CF. if'c\F( then !5" else !T"

SU00ODATE HEY Sta e: In /ersion <5257&1 !identifying last 0alue hich generated for the first time compiling and running the 7ob in surrogate key stage, for that reason in 0ersion C e ha0e to do a another 7ob to store a last generated 0alue". -nd that 7ob in 0ersion C.&./21 design

SF

Sk

copy

ds

,ail

peek

In this 7ob, a surrogate key stage used for generates the system key column 0alues that are like primary key 0alues. $ut it generate at first compile only.
$ut by taking tail stage

ith that e tracing the last 0alue and storing into the peek

stage that is in buffer. 2ith that buffer 0alue e can generate the se=uence 0alues that are surrogate key in 0ersion C.&./2. In /ersion =5E1 !,he abo0e problem ith 0ersionC is o0er comes by 0ersion E.D surrogate key by taking an empty te/t'empty.t/t( file and storing last 0alue information in that file, and by using that it generates the se=uence 0alues"

Navs notes

Page 143

2010

DataStage

$efore.t/t

S:

Data Set

Properties of S: 0ersionE1 6ption 11 generated output column name \ skid Source name \ g1XdataXempty.t/t Source type \ flat file 6ption 21 database type\ oracle 'D$2> oracle( Source name \ s=@ 'in oracle G create se=uence s=@(XX it is like empty.t/t Pass ord\ tiger 4ser id\ scott Ser0er name\ oracle Source type \ database se=uence

DAY .6 DataSta e Mana er E7)ort: !+/port is used to sa0e the group of 7obs for the e/port purpose that here e ant". 5a0igation . !ho to e/port"S DataStage toolbar
Change selection1 4D D

or

o Uob components to e/port

R0M<0

or

S0/0CT 4//

3ere there are three options are . +/port 7ob designs ith e/ecutables' here applicable(

Navs notes

Page 144

2010

DataStage
. . o +/port to file
Sou !e na%eC.....

+/port 7ob designs ithout e/ecutables +/port 7ob e/ecutables ithout designs
2010

2here e ant locate the e/port file. o ,ype of e/port


's>

$y t o options e can e/port file


.

ds/ C G bit encoded /ml

. I$)ort:

!It is used to import the .ds/ or ./ml e/tensions to a particular pro7ect and also to import some definitions as sho n belo ". 6ptions of import are o DataStage componentsO o DataStage components '/ml(O o +/ternal function definitions o 2eb ser0ices function definitions o ,able definitions
o

I*S definitions In I*S t o options are, Database description 'D$D( Program Specification $lock 'PS$ > PC$(

In DataStage components.. o Import from file


G&ve t6e sou !e na%e to &%$o t V. Navs notes Page 145

DataStage

Import all Import selected Denerate 0e)ort:

o0er rite ithout =uery


2010

perform impact analysis

!It is for to generate report to a 7ob or a specific, that it generates a report to a 7ob instantly". For that, go to File o Kenerate report Aeport name 6ptions 4se default style sheet 4se custom style sheet -fter finishing the settings1 It<s generates in default position !>reportingsendfile> send file> tempDir.tmp" Node Confi uration1 91 ,o see nodes in a pro7ect1 o Ko to run director Check in logs Double click on main program1 -P, config file

91 2hat are 5ode ComponentsS 1. 5ode name G logical CP4 name. 2. Fast name G ser0er name or system name. F. Pools G logical area here stages are e/ecuted. #. Aesource G memory associated ith node.

Navs notes

Page 146

DataStage

5ode components stores in the disc<s permanent in the belo address.


2010

!c1XibmXinformation ser0erXser0erXparasets"

o 5ode components stores temporary is the belo address. !c1XibmXinformation ser0erXscratch"

91 2hat node that handles to run each and e0ery 7ob and name of the configuration fileS o +0ery 7ob runs on -P, node as on belo name that is default for e0ery 7ob.
o

5ame of configuration file is C1XibmX.........Xdefault.apt

91 3o to run a 7ob on specific configuration fileS o Uob properties Parameters -dd en0ironment 0ariables o Parallel Compiler Config file '-dd [-P,JC65FIKJFI%+(

91 3o to create a ne 5ode configuration FileS o ,ools Configurations ,here e see o


De,au"t.a$t

o Default.apt ill ha0e the single node information.


o

2e can create ne node by option 5+2


N0@

Sa0e the things after creating ne nodes

Navs notes

Page 147

DataStage

$y, sa0e configuration -s

o 56,+1 $est E or 1I nodes is to create in a pro7ect, and


2010

91

2_D,2_1'say( CP4<s ha0e M so on.

If uni processing system ith 1 CP4 needs minimum 1 node to run a 7ob then for S*P ith # CP4 needs ho many minimum nodesS o 6nly 1 node.

Ad/anced Find: !It is the ne feature to 0ersionE" It consists of to find ob7ects of a 7ob like list sho n belo 1. 2here used, 2. Dependency, F. Compared report. 91 3o to run a 7ob in a 7obS 5a0igation for ho to run a 7ob in a 7ob Uob properties o Uob control Select a 7ob ............. ............. ............. o Dependencies Select 7ob 'first compile this 7ob before the main 7ob( 91 Aepository of -d0ance Find 'means palate of ad0ance find(S o 5ame to find1
NavW

here, Uob Control %anguage 'UC%( script presents.

o Folder to search1 DAC'atastageC o ,ype o Creation


Navs notes Page 148

DataStage
o %ast modification o 2here used
2010

Find ob7ects that use any of the follo ing ob7ects. 6ptions1 -dd, remo0e, remo0e all

o Dependencies of 7ob 91 -d0ance Find of repository through tool barS o Cross pro7ect compareO. o Compare against o +/port
o

*ultiple 7ob compile

o -dd to palate
o

Create copy

o %ocate in tree
o

Find dependencies

91 3o to find dependency in a 7obS o Ko to tool bar Aepository Find dependency1 all types of a 7ob

DAY .< DataSta e Director DS Director maintains1 Schedule *onitor Qie s


Navs notes Page 149

DataStage
o Uob 0ie o Status 0ie
2010

o %og 0ie *essage 3andling $atch 7obs 4nlocking Schedu!e: !Schedule means a 7ob can run in specific timings" ,o set timings for that, o Aight click on 7ob in the DS Director Click on !add to scheduleO" -nd set the timings.

In real time, specific the 7ob se=uence by some tools sho n belo o ,ools to schedule 7obs 'its happen the production only( Control * Cron tab -utosys

Pur e: !It means cleaning or ash out or deleting the already created logs" . . In 7ob can e clear Uob logs ha0ing a option is FI%,+A. $y right clicking e can filter. 5a0igation for set the purge.
Navs notes Page 150

DataStage
o ,ool bar Uob o Immediate purge o -uto purge Monitor: !It sho s the Status of 7ob, numbers of ro time 'i.e. ro s>sec(, percentage used by CP4(" 5a0igation for 7ob that ho to monitor. o Aight click on 7ob Click monitor !it sho s performance of a 7ob" here e/ecuted, started at 'time(, elapsed
2010

Clear log 'choose the option(

%ike belo figure for a simple 7ob.


StatusNo. o.s sta te' at e"as$se' t&%e o.sHse! XCP2 9&n&s6e' 6 s)s t&%e 00A00A03 2 +9 9&n&s6e' 6 s)s t&%e 00A00A03 2 +7

56,+1 $ased on this e can check the performance tuning of a stage in a particular 7ob. 0easons for (arnin s:
Default

arnings in se=uential file are at offset1 D

1. Field !]column nameR" has import error and no default 0alue8 data 1 Y e i d Z, 2. Import arnings at record D.

Navs notes

Page 151

DataStage
F. Import unsuccessful at record D. o ,hese three arnings can sol0e by a simple option in se=uential file, i.e., *issing record delimiter !XrXn", sa +6F instead 'format mismatch( 2hen e orking on look.up, in the secondary stage ha0e duplicates e ith get arning. 2here these is length miss match, like in source length '1D( and target '2D(. 2hen sorting for different key column in 7oin. 2hen second stage in merge. A>ort a 4o>: #: 3o can e abort a 7ob conditionallyS Conditionally o 2hen e Aun a 7ob ,heir e can keep a constraint %ike arnings o 5o limit o -bort 7ob after1 In transformer stage o Constraint 6ther ise>log -bort after ro s1 & 'if & records not meet the constraint it<s simple aborts the 7ob( 2e can keep constraint same like this only in Aange %ookup. Messa e 'and!in : !If the arnings are failed to handle then e come across the message handling" 5a0igation for ho to add rule set a message handle the arnings.
5

Navs notes

Page 152

2010

First line is column names\ set as true.'here default option is false(

DataStage
Uog logs o Aight click on a arning
2010

-dd rule to message handler , o options Suppress from log Demote to information

Choose any one of abo0e option and add ru!e.

:atch 4o>s: !+/ecuting set of 7obs in a order" 91 3o to create a $atchS 5a0igation for creating a batch DS Director o ,ools $atch 5e 'gi0e the name of batch( -dd 7obs in created 7ob batch o Uust compile after adding in ne batch. A!!o( $u!ti)!e instances: !Same 7ob can open by multiple clients and run the 7ob" If e not enable the option it ill open in a read only that you can<t edit. $ut a 7ob can e/ecute by multiple users at the same time in director. 5a0igation for enable the allo multiple instance Ko to tool bar in DS Designer o Uob properties Un!oc* the 4o>s: Check the bo/ on !allo multiple instances"

Navs notes

Page 153

DataStage
!2e can unlock the 7obs for multiple instances by release all the permissions" 5a0igation for unlock the 7ob ,ool bar o Uob Cleanup resources Processes Sho by 7ob Sho all o Aelease all For global to see PIDs for 7obs, for that DS -dministrator o Keneral +n0ironment 0ariables Parallel o Aeporting

2010

DS Director

-dd '-P,JP*JS362J PIDS(

Set as 'true>false(

Navs notes

Page 154

DataStage
DAY .= 3e> Conso!e Ad$inistrator Co$)onents of ad$inistrator1 -dministration1 o 4ser M group 4sers 4ser name M pass ord is created here. -nd assigning permissions
2010

Session managements1 o -cti0e sessions Aeports1 o DS I5DI- 'ser0er>system name( Domain *anagement1 o %icense 4pdate the license here 4pload to re0ie Qie report. 2e can create the reports. For admin

Scheduling management1 !It is kno hat user is doing from part" o Scheduling 0ie s 5e

Navs notes

Page 155

DataStage
schedule ` Aun creation task run ` last update DAY .A Ko> Se1uencin Sta es of 4o> se1uencin 1 !It is for e/ecuting 7obs in se=uence that e can schedule 7ob se=uencing" 6r !Its control the order of e/ecution 7obs" - simple 7ob ill process in belo process. o +/tract o ,ransform o %oad o *aster 7obs1 !its control the order of e/ecution". Important stages in 7ob se=uencing are 1. Uob acti0ity 2. Se=uencer F. ,erminator acti0ity #. +/ception handler &. 5otification acti0ity I. 2ait for file acti0ity Ko> Acti/it": !It is 7ob acti0ity that holds the 7ob and it ha0e 1.input and n.outputs"
2010

Uob acti0ity 3o the Uob -cti0ity drag into design can0asS

Navs notes

Page 156

DataStage
. In t o methods e can, 1. Ko to tool bar G 0ie G repository G 7obs G 7ust drag the 7ob to the can0as. Simple 7ob1
<= @4R
2010

2. Ko to tool bar G 0ie G palate G 7ob acti0ity G 7ust drag the icon to the can0as.

Student
947/

Se=uencer

student rank

,erminator acti0ity Properties of Uob -cti0ity1 %oad a 7ob hat you ant in acti0e o Uob name1
+/ecution action1
DACDSCs!':5o8

R2N

Do not check point Aun

options . 'Aun>Aeset if re=uired, than run> Qalidate only> Aeset only( Chec* Point: !Uob has re.started here it aborted it is called chec* point" It is special option that e must enable manually Ko to
o

Uob properties of DS Designer +nable check point

Navs notes

Page 157

DataStage

Para$eter $a))in : !If 7ob ha0e already some parameters to that e can map to the another 7ob if e need" Tri ers: !It holds the link e/pression type that ho to act"
Na%e o, out$ut "&n# <= @4R .a n&ngsK 9a&" 0>$ ess&on t)$e <=;E!on'&t&ona"F 0>$ ess&on Je>e!ute' <=K
2010

@4R;E!on'&t&ona"F Je>e!ut&on ,&n&s6e' .&t6 9a&"e';E!on'&t&ona"F Je>e!ut&on ,a&"e'K

-nd some more options in !89pression type" . . . . ,erminator -cti0ity1 !It is stage that handles the error if it fails" Properties1 It consists of t o options1 for if any sub ordinate 7obs are still running. Its for 7ob failure o Send S,6P re=uests to all Aunning Uobs -nd ait for all 7obs to finish It<s for ser0er do ns in bet een the process running. o -bort ithout sending S,6P re=uests 2ait for all 7obs to finish first.
Navs notes Page 158

4nconditional 6ther ise 4ser Status

!5>- 'its default(" !5>-" \ !]user define messageR"

Custom.'conditional( . !custom"

DataStage

Se1uencer: !it holds multiple inputs and multiple outputs" -5T G it<s for F-I% ';n< number of links(
2010

It has t o options or modes1 E7ce)tion hand!er:

4// 4N3

-%% G it<s for 6: M 2-A links

!It handles the ser0er interrupts" e don<t connect any stage here it ill separate in a 7ob

- simple 7ob for e/ception handler1

+/ception handler

5otification acti0ity

,erminator acti0ity

+/ception handler properties1 !Its ha0e only general information" Notification Acti/it": !It is sending ackno ledgement in bet een the process" 6ption to fill in the properties1 S*,P *ail ser0er 5ame1 Senders email address1 Aecipients email address1 +mail sub7ect1 -ttachments1 +mail body1 3ait for fi!e Acti/it":
DACDSCSCD:/<4D 8 o.se ,&"e

!,o place the 7ob in pause"

Navs notes

Page 159

DataStage
File name1 , o options1 ait for file to appear Do not timeout 'no time length for the abo0e options(
2010

2ait for file to disappear ,imeout length 'hh1mm1ss(

DAY 2E Perfor$ance tunin (5r5t )artition techni1ues I Sta es Partition techni1ues: !are t o categories" He" >ased: 1. 3ash 2. *odulus F. D$2 #. Aange He" !ess: 1. Same 2. Aound Aobin F. +ntire #. Aandom In key based partition techni=ue1 D$2 is used hen the target is database. D$2 and Aange techni=ues are used rarely. 3ash partition techni=ue1 o It is selected hen number of key columns ill be there. i.e., key columns 'R1( and hetro data types 'means different different data types( o 6ther than this situation e can select !modulus partition techni=ue". *odulus partition techni=ue1 o It distributes the data based on mod 0alues. o -nd mod formula is *6D'0alue> 5umber of nodes(

Navs notes

Page 160

DataStage

56,+1 *odulus is ha0ing high performance than 3ash, because the ay its groups the data 56,+1 $ut modules can only be selected, if the only one key column and only one data type that is only integer 'data type(. In :ey less partition techni=ue1 Same1 is ne0er distributes the data, but is carry pre0ious techni=ue that continuous. +ntire1 ill distribute the same group of records to all nodes. ,hat is the purpose of a0oiding the mismatch records in bet een the operation. Aound Aobin1 it is for generated stage like Column Kenerator and so on is associated this partition techni=ue. o It is the best partition techni=ue than comparing to random. Aandom1 all key less partition techni=ues stages are used this techni=ue its default. Perfor$ance tunin (5r5t Sta es:
If
2010

and based on the mod 0alue.

hen Sorting already perform then KOIN stage e can use.

+lse LOOHUP stage is the best.

LOOHUP FILE SET1 is options use to remo0e duplicates in lookup stage.

SO0T stage1 if comple/ sort 1 go to Stage sort +lse1 go to link sort.

0e$o/e Du)!icates1 the data already sort G Aemo0e duplicates stage

Sorting and remo0e duplicates G go to link sort 'uni=ue(

Constraints1

hen operation and constraints needed G go to Transfor$er stage

Navs notes

Page 161

DataStage

+lse only constraints G simply go to FILTE0 stage.

DAY 21 Co$)ressJ E7)andJ DenericJ Pi/otJ 7$! in)ut I out)ut Sta es

Co$)ress Sta e: !It is a processing stage that compresses the records into single format means in single file or it compresses the records into )ip". It supports G !1 input and 1 output".

Properties1 Stage o 6ptions Input o ]do nothingR 6utput o %oad the ;*eta< data of the source file. E7)and Sta e: !It is a processing stage the e/tract the compress data or its e/tract the )ip data into un)ip data". It supports G !1 input and 1 output".
Navs notes Page 162

Command\ 'compress>g)ip(

2010

Con/ersions1 Modif" stage and Transfor$er stage 'it takes more compile time(.

DataStage

Properties1 Stage1 o 6ptions 1 . command\ 'uncompress>gun)ip( Input1 o ]do nothingR 6utput1 o %oad the *eta data of the source file for the further process. Encode Sta e:

!It is processing stage that encodes the records into single format ith the support of command line". It supports G !1.input and 1.output".

Properties1 Stage
o

6ptions1 Command line \ 'compress> g)ip(

Input o ]do nothingR 6utput o %oad the ;*eta< data of the source file. Decode Sta e: !It is processing stage that decodes the encoded data". It supports G !1.intput and 1.output".

Navs notes

Page 163

2010

DataStage

Stage o 6ptions1 command line \ 'uncompress>gun)ip( 6utput


o

%oad the ;*eta< data of the source file.

Deneric Sta e: !It is processing stage that holds any operator can call here, but it must and should full fill the properties". It supports G !n. inputs and n.outputs, but no re7ects"

2hen compiling the 7ob, the 7ob related 6S3 code

ill generated.

Keneric stage can call the operator on the datastage.


Its purpose is migration ser0e 7obs to parallel 7obs 'I$* has /. migrator that con0erts

into CDB(
-nd it can call -5T operator here, but it must full fill the properties.

Properties1 Stage o 6ptions

6perator1 copy ' e can rite any stage operator here(

Input o ]do nothingR 6utput o %oad the *eta data of the source file.

Navs notes

Page 164

2010

Properties1

DataStage
Pi/ot Sta e: !It is processing stage that con0erts ro s into columns in a table".
2010

Its supports G !1.input and 1.output". Properties1 Stage G ]do nothingR Input1 ]do nothingR 6utput1
Co"u%n na%e /engt6 R0C va !6a De &vat&on 25 S[/ T)$e

Y!o":n .&t6 !o%%a se$a ate'Z

MML Sta es: !It is real time stage that the data stores in single records or in aggregator ith in the /ml format".
-nd H*% Stage di0ided into t o types, they are

1. H*% 6utput 2. H*% Input MML In)ut: !".

Navs notes

Page 165

You might also like