You are on page 1of 8

DataStage 8 Tutorial: Using Range Lookups Page 1 of 8

Toolbox for IT Share and compare information with 1.3 million professionals
Sear
Kn o w l e d g e S h a r i n g C o m m u n i t i e s Select a Community Sign In

My Home People Vendors Communities Ask Question / Find

Business Intelligence Community Search Blogs

Home Blogs Groups Wiki Documents Research Topics Subscrip

Tooling Around in the IBM InfoSphere


by Vincent McBurney (Deloitte Manager)
Blog Main / Archive / Invite Peers / Connect to this blog

Previous Entry / Next Entry

DataStage 8 Tutorial: Using Range Lookups


Vincent McBurney (Deloitte Manager) posted 6/13/2007 | Comments (13)

Looking at the new range lookup functionality in DataStage 8.

DataStage 8 comes with some range lookup functionality within the lookup stage, a feature
that came in at number four in my My top ten features in DataStage Hawk. A field on an
input link can be compared to two fields on a lookup link or vice versa using a between
clause returning one or more rows from a lookup link.

This post has an example of a range lookup using pictures and I have also filled in a wiki
page explaining the steps of the different types of range lookups in HOWTO:Do a range
lookup in DataStage 8 where you can add your own examples or fix the instructions. Part of
my Wiki Wednesday series of using wiki entries to describe an aspect of data integration.

DataStage has always performed joins very efficiently when there are exact key fields that
match using the lookup, join or merge stage. Range lookups are more challenging as it's a
less efficient way to join whether you are doing it in an ETL job or on a database. You can
do a range lookup in DataStage 7 using a lookup stage and a filter stage, you can do it using
a sparse lookup and you can do it by loading both tables into a database staging area and
joining them in SQL. This tutorial shows how to do it in a single Lookup stage providing a
much simpler design.

Challenge

http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-using-range-lookups-16911 10/11/2008
DataStage 8 Tutorial: Using Range Lookups Page 2 of 8

I will use a data integration challenge to demonstrate a range lookup. Let's say someone
with way too much time on their hands has decided they want to know how much money
public holidays have cost the company over the years. Since timekeeping in the company
has always been rubbish they don’t currently track it. They do have employee records with
a history of roles, salaries and locations over the years. How would you use DataStage to
tally the cost of public holidays?

Answer
The tricky part is that some public holidays are national, some are state based, some are
specific to a city and some to a region. Some are a one off - Australia had an unofficial
public holiday after we won a boat race. I have created three lookup files - one for each
type of public holiday. In these pictures I will show how to perform the lookups for just the
national holidays - the same process can be used to create rows for the other types as
well.

This can be done using a simple lookup design:

The employee history is processed into a single file showing each change in the location
and charge rate of each person. The history of public holidays over the years are manually
loaded into a text lookup file with a different file for national, state and local holidays:

http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-using-range-lookups-16911 10/11/2008
DataStage 8 Tutorial: Using Range Lookups Page 3 of 8

In DataStage 7 we can only join "country" and "state", we cannot join date because it needs
to be a range lookup. In DataStage 8 we have a new "Key Type" field with values of
Equality, Caseless and "Range". Choosing Range for the date field lets us compare the field
to two fields on the history stream. Set the Type to Range and then doubleclick right next
to it in the empty Expression field to bring up the range Express form shown here:

The form will load the Holiday Date field on each row, we manually set the Start_Date and
End_Date fields and choose the operators ">=" and "<=" to create a range check . This
returns multiple holidays from the lookup for the duration of that history - every public
holiday between the start and end dates are returned.

We then set up the rest of the lookup stage properties as per a normal lookup:

http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-using-range-lookups-16911 10/11/2008
DataStage 8 Tutorial: Using Range Lookups Page 4 of 8

Every single input row of employee history can return zero or more output holiday rows so
we need to set the lookup properties to drop rows that have not holiday match and accept
multiple rows where there is more than one match. These rows can then be saved to a
table or put through an aggregation stage and turned into statistics.

Repeating this type of range lookup against a lookup file with annual, state and city
holidays will bring in all the public holidays in scope, each using different Equality joins
but the same Range join:

The range lookup can also be used to compare a single value in the primary stream to two
values on the lookup, the steps described in the wiki entry for range lookups.

Disclaimer: The opinions expressed herein are my own personal opinions and do not
represent my employer's view in any way.
Previous Entry / Next Entry

http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-using-range-lookups-16911 10/11/2008
DataStage 8 Tutorial: Using Range Lookups Page 5 of 8

Comments (13) RSS for Comments

Raja writes: 6/14/2007 #

Good effort

Priyadarshi writes: 6/14/2007 #

I think "reference link range lookup" has a problem. but overall its a great feature
added in Hawk.

A nice effort by Vincent McBurney for sorting out the newly added features and
describing them as well.

regards

ttteety writes: 6/15/2007 #

Good post Vincent.

But I have a question that if we want to do range lookup from lookup files. Can we do
that in Range Lookup stage?

Thanks

Vincent McBurney writes: 6/15/2007 #

Yes you can do range lookups against lookup files. Parallel jobs let you use almost
anything as a source for lookup data - sequential files, database stages, lookup filesets,
datasets etc. Any of these lookup types can be the source of a range lookup. DataStage
converts all of these sources into an internal data format on startup, for small sources
it gets loaded into RAM, for larger sources it overflows into lookup files on the nodes.
Lookup filesets are handy in that they are already loaded onto the nodes and need little
extra work on startup.

Raja writes: 6/18/2007 #

Hi vincent,
you have any post on scd implementation

Vincent McBurney writes: 6/18/2007 #

You read my mind! Yes, will be looking at the new slowly changing dimension stage in

http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-using-range-lookups-16911 10/11/2008
DataStage 8 Tutorial: Using Range Lookups Page 6 of 8

the next couple weeks and hope to have a post about it.

Raghu writes: 6/19/2007 #

Hi vincent, thanks for the good info. Keep posting

Thanks,
Raghu

Lakshmi writes: 7/6/2007 #

Hi Vincent, Its well explained. Thanks for all the effort.


regards
Lakshmi

Andy Sorrell writes: 9/15/2007 #

Great Post (as usual) Vincent.

I can confirm what Priyadarshi mentioned above - in the current release the reference
link range lookups don't work. I took at "What's new in DataStage / QualityStage 8.0"
class from IBM a couple of weeks ago and the instructor confirmed that it has a problem
but they are working on it and it will be corrected "soon".

praveen writes: 12/26/2007 #

Hi all
I am new to Datastage, and i need a routine to write and call that routine so how can i
write the routine if possible give a small example so that i can get an idea.
Thanks in advance

rama writes: 6/1/2008 #

Hi guys i am new for this field, i need some information for DATA STAGE. please help
me how to start this technology.

priya writes: 8/11/2008 #

Hi,
Can anybody let me know what are the major differences between Dtastage 7.5 and 8.0

http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-using-range-lookups-16911 10/11/2008
DataStage 8 Tutorial: Using Range Lookups Page 7 of 8

Thanks in Advance
Priya

Vincent McBurney writes: 8/19/2008 #

@Priya, Have a look at my blog post "What's new in DataStage 8" and look for the
DataStage wiki page on this site that has the changes introduced in each version of
DataStage.

You are not logged in. Sign in to post unmoderated comments or join the community to
create your free profile today!
Name: (Will display on the site)
E-mail: (Not displayed. No Spam)

Lines break automatically. Please preview your message before posting.

If not logged-in your post will not appear until approved by a community moderator. To uphold
community standards, comments that are inflammatory, offensive, or contain profanity or
advertisements may be removed by the author or a community moderator.

c Connect to this blog to be notified of new entries.


d
e
f
g

Preview Submit

Related Blogs
DataStage 8 Tutorial: Using Range Lookups
40 DataStage Learning, Tutorial and Certification Online Resources
Related Groups
New DWH dbase engine to replace Oracle 9i
non-equality based lookup
Related Wiki Articles
Do a range lookup in DataStage 8
Data Warehouse Concepts

More from this author


Why Teradata NCR split is good news for staff, job hunters and partners

User and Group security for the Information Server and DataStage 8

The 20 Hardest Oracle Interview Questions ever

http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-using-range-lookups-16911 10/11/2008
DataStage 8 Tutorial: Using Range Lookups Page 8 of 8

HP Blogs pick the juicy low hanging fruit (BlogTipping Day)

How to diversify your skills for a better DataStage Career


Archive Category: DataStage

Keyword Tags
datastage,8,range,lookup

Disclaimer: Blog contents express the viewpoints of their independent authors and are not reviewed for
correctness or accuracy by Toolbox for IT. Any opinions, comments, solutions or other commentary expressed by
blog authors are not endorsed or recommended by Toolbox for IT or any vendor. If you feel a blog entry is
inappropriate, click here to notify Toolbox for IT.

Browse for Toolbox for IT Blogs

About / Contact us / Privacy / Terms of Use / Work at Toolbox.com / Advertise with Toolbox for IT / Provide Feedback

Communities: Business Intelligence / C Languages / CIO / CRM / Database / Data Warehouse / EAI / Emerging Techno
Knowledge Management / Networking / Project Management / SCM / Security / Storage / Web Design / Wireless / Baan
PeopleSoft / SAP / Siebel / UNIX / Visual Basic / Windows

Also at Toolbox for IT: Blogs / Groups / Wiki / Events & Webcasts / Job Center / Vendor Research Directory

Copyright 1998-2008 CEB Toolbox, Inc. All rights reserved. All product names are trademarks of their respec
CEB Toolbox, Inc. is not affiliated with or endorsed by any company listed at this site.

http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-using-range-lookups-16911 10/11/2008

You might also like