You are on page 1of 9

Everything You Need to Know about the DataStage 8 Certification

Exam
All the IBM links, Prometric test center links, certification tips, study guide and overview of
the DataStage 8 Certification Exam.
DataStage 8 Certification Exam has arrived! An eagle eyed member of www.dsxchange.com
spotted the exam and posted to the forum.
This is my DataStage 8 Certification Home Page and I will be updating it with any new
information or links. You can find this page easily using the shortened URL
tinyurl.com/ds8exam. Remember that link, it will bring you back to this page! Feel free to
leave comments if you pass your exam or have your own tips.

tinyurl.com/DS8exam

DataStage 8 Certification Exam Information
There are 5 less questions than the DataStage 7.5 exam which makes it harder to pass! If
you do not know a particular answer to an obscure part of DataStage such as NLS or column
propagation this will have a bigger impact on your score.
Number of questions: 70
Time allowed in minutes: 90
Required passing score: 75%
Test languages: English
Cost: USD$175 (may vary between countries)
Test venue: one of the Prometric testing centres, available in most major cities.
DataStage 8 Certification Exam Links
These are the exam pages on the IBM website. If you swtich to the Prometric test site where
you find test venues and book test appointments make sure you look for the test 000-418 in
the test list boxes.
Test Homepage.
Test Objectives - I have copied them below and added my own notes in italics.
Sample questions.
Test Overview.
Authorized Prometric test centers (worldwide testing).
o Locate A Test Site
o Schedule an Appointment
o Exam Fees
IBM DataStage Training Courses.
DataStage 7.5 Certification Exam Previous Blog Posts
I have some blog posts on the previous version 7.5 exam that might come in useful:
DataStage Certification - how to pass the exam
How To: Study Guide for DataStage Certification
A new look at DataStage certification
Can I prepare for certification on an old version of DataStage?
QualityStage Certification exam has arrived
IBM data profiling gets a boost with Information Analyzer certification exam
DataStage Certification Exam Review
The DataStage 8 Certification exam looks very similar to the 7.5 version with extra questions
on the Slowly Changing Dimension stage. The exam outline does not seem to have any
questions about some new version 8 functionality such as Parameter Sets or range based
Lookups. It does not seem to have any questions about the Metadata Server and console
apart from a generic and vague "describe how to properly configure DataStage V8.0."
The exam is heavily weighted to parallel jobs with only a couple sections that cover metadata
and sequence jobs that Server Edition users would be familiar with. It would be difficult to
pass this exam without having access to a parallel job play area - but not impossible. If you
read up on parallel jobs and follow some threads on www.dsxchange.com and read the
resources in my post 40 DataStage Learning, Tutorial and Certification Online Resources
you might be able to get certified in parallel jobs without having used them! It's not an
approach I would recommend. I think you are better off learning the tool by using it.
I will be sitting this exam at the IBM IOD Conference in October and I will let you know
how I go. Certification exams are free at IBM IOD conferences.
DataStage Certification Exam Tips
You need to get 53 of 70 questions right to pass! Go through the exam once answering all
the question you know and write down the numbers of the questions you don't know. Add an
asterisk to those questions you don't know that will be time consuming to answer.
In the second pass go to the hard questions that are relatively quick to answer and get answers
in for those.
In the third pass revisit the hard time consuming questions. You might find answers you
gave for other questions have helped you with the hard questions. Sometimes a couple
questions overlap and give clues to each other.
The idea is to make sure you have enough time to answer all the easy questions and get up to
your 53 right answers and not to waste time on a hard question that you might get wrong
anyway. Leave the hard questions until last.

DataStage Certification Exam Test Preparation
1. DataStage v8 Configuration (5%)
1. Describe how to properly configure DataStage V.8.0.
1. This is kind of vague but focus on how DataStage 8 gets attached to a
Metadata Server via the Metadata Console and how security rights are set up.
2. Read up on configuring DB2 and Oracle client and ODBC.
3. Get to know the dsenv file. Read the DataStage Installation Guide for post-
installation steps.
2. Identify tasks required to create and configure a project to be used for V.8.0 jobs.
1. DataStage Parallel Job Advanced Developer Guide has a section on project
specific environment variables such as reporting, c++ compile options,
tracing and optimisation. You do not have to memorise them but at least read
through each setting.
2. Get to know the DataStage Administrator Create Project and Project Settings.
3. Given a configuration file, identify its components and its overall intended purpose.
1. You can practice with different configuration files with one, two and four
nodes and view Job Monitoring to see how the job is impacting. Turn on job
scoring to see what the job really looks like. Read up on node pools and
memorise what is in the manuals.
2. MetaData (5%)
1. Demonstrate knowledge of Orchestrate schema.
1. An orchestrate schema is just a table definition in a file format. Try saving a
couple to file and get to know how they are put together.
2. Try reading and writing to a dataset using a schema file definition.
2. Identify the method of importing metadata.
1. Read my blog post Navigating the many paths of metadata for DataStage 8 -
there are actually quite a few different ways to import metadata in version 8.
2. Try imports through the metadata bridges, DataStage Designer plugin import
and the DataStage Designer Connector import.
3. Given a scenario, demonstrate knowledge of runtime column propagation.
1. Try an exercise where you drop and regain columns in a job using
propagation. For example generate a set of columns and push them through
three copy stages. Remove two columns from the middle copy stage and add
them back again in the third copy stage with column propagation turned on.
3. Persistent Storage (10%)
1. Given a scenario, explain the process of importing/exporting data to/from framework
(e.g., sequential file, external source/target).
1. Get to know the Sequential File Stage, the Column Import Stage and the
External Source Stage. All three do roughly the same thing but have different
options. They all provide a stream of data to the same import function that
turns data into orchestrate parallel data.
2. Given a scenario, describe proper use of a sequential file.
1. Read a sequential file.
2. Read up on Sequential Files stage options such as multiple readers, first row
is column names, reject rows.
3. Compare using a file name to using a file wildcard.
4. Try reading two files at once by configuring the stage with two file names.
3. Given a scenario, describe proper usage of CFF (native not plug-in).
1. It can be hard to practice with a CFF stage if you do not have CFF
data! Read up on it and see if you can find a Cobol Copy Book - try Googling
for one, if you can get your hands on one import it as a CFF file definition and
try loading the metadata into the CFF stage and pretend you are using it to
extract data. That will show you how the stage works.
2. Have a look at all the tabs on the CFF stage for data formats, record filters
etc.
4. Describe proper usage of FileSets and DataSets.
1. These are the hash files of Enterprise Edition. Try comparing a Lookup
Fileset to a Dataset and watch what happens in the temp directory and node
scratch directory for each type of job as the job is running.
2. See what files are created behind the scenes.
3. Compare a very large lookup to a very small lookup to see how filesets and
dataset lookups are different.
5. Describe use of FTP stage for remote data.
1. You can just read up on this. Basically DataStage wraps FTP and Sequential
File read into a single stage so you read the file as it is transferred.
6. Identify importing/exporting of XML data.
1. It is very easy to get your hands on XML data - lots of it around. Try
importing it into DataStage and read the XML Input stage and look for the
XML tutorials on my blog post 40 DataStage Learning, Tutorial and
Certification Online Resources. The trick is that the XML Import Stage is
looking to receive either a huge chunk of XML data or an XML file name
passed into it from a preceding stage. This means you have to pass it the data
from something like a Sequential File Stage or the External Source stage.
2. Have a look at the XPATH statements created in the Description field of a
Table Definition from an XML metadata import.
3. Read up on how the KEY field of an XML Import determines how an XML file
gets flattened to a flat table.
4. Parallel Architecture (10%)
1. Given a scenario, demonstrate proper use of data partitioning and collecting.
1. The DataStage Parallel Developers Guide has a very good description of data
partitioning that all developers should read.
2. Given a scenario, demonstrate knowledge of parallel execution.
1. A good way to learn this is to create a job that generates a sequence of
numbers in an ID field and then run that on multiple nodes, use the Monitor
tool to watch what happens to Transformers, Lookups and Sequential file
stages in this job.
5. Databases (10%)
1. Given a scenario, demonstrate proper selection of database stages and database
specific stage properties.
1. The questions are likely to be about Oracle, DB2 and SQL Server. Most likely
the first two. You can read up on them in the Developers Guide and
sometimes there is an additional PDF. You can also add these stages to a job
even if you do not have access to the database itself and play with the
properties of the stage.
2. Identify source database options.
1. Add each type of database stage to a job and play around with the different
settings.
3. Given a scenario, demonstrate knowledge of target database options.
1. Compare inserts and updates to loads and bulk loads.
2. Compare generated SQL from the different types of upsert options (insert then
update, update then insert, replace).
3. Use www.dsxchange.com to search for discussions on array size and
transaction size for database stages.
4. Given a scenario, describe how to design v.8.0 ETL job that will extract data from a
DBMS, combine with data from another source and load to another DBMS target.
5. Demonstrate knowledge of working with NLS database sources and targets.
1. I did not get an NLS question in my version 7.5 exam but it may pay to read
the NLS guide at least once so you are familiar with it. Focus on NLS with
databases as this seems to be the only section that mentions NLS.
6. Data Transformation (10%)
1. Given a scenario, demonstrate knowledge of default type conversions, output
mappings, and associated warnings.
1. Read the DataStage Developers Guide and understand the difference between
implicit conversion - where you map a field of one data type to a different data
type without any specific conversion code.
2. Read up on explicit conversion where you use a function like
StringtoTimestamp to change the metadata during the mapping.
3. Try putting a string field into a numeric, putting an invalid date into a date
field, putting a null into a not null to see what job warnings you get.
2. Given a scenario, demonstrate proper selections of Transformer stage vs. other stages.
1. Read my blog post Is the DataStage parallel transformer evil? for a discussion
on what the Transformer does.
2. For scenarios when you need the Modify stage try DataStage Tutorial: How to
become a Modify Stage Zen Master.
3. Given a scenario, describe Transformer stage capabilities (including: stage variables,
link variables, DataStage macros, constraints, system variables, link ordering,
@PART NUM, functions.
1. A good way to understand partitioning functions is to try and build your own
Transformer Parallel Job Counter.
2. Go into a Transfomer and click in a derivation field and explore every part of
the Right Mouse Click menu - try adding job start time, job name, @NULL,
@DAY and various other macros to an output link and send it to the Peek
stage to read the results.
4. Demonstrate the use of Transformer stage variables (e.g., to identify key grouping
boundaries on incoming data).
1. Try a Stage Variable scenario where you compare values in the current row to
values in the previous row of data by storing values in stage variables. Have
a look at this DSXChange Vertical Pivot thread for ideas.
2. Try a Stage Variable scenario where you do null handling of input columns in
stage variables and then use the results in derivation functions.
5. Identify process to add functionality not provided by existing DataStage stages. (e.g.,
wrapper, buildops, user def functions/routines).
1. Read the DataStage Advanced Parallel Job Developers Guide for a
description of wrappers and buildops.
2. Step through the parallel function tutorial in my post 40 DataStage Learning,
Tutorial and Certification Online Resources. It's good to know how parallel
routines work.
6. Given a scenario, demonstrate proper use of SCD stage
1. Read up on the SCD stage and have a look at the SCD online tutorial in my
post 40 DataStage Learning, Tutorial and Certification Online Resources. It
can be tricky understanding the SCD stage if you have never done star
schemas and I wouldn't spend a lot of time trying to learn. Just learn the
dynamics of the stage and save your learning of star schemas for another time.
7. Demonstrate job design knowledge of using RCP (modify, filter, dynamic
transformer).
1. This is a tough one - dynamic transformer? I am not even sure what that is.
2. Practice column propagation through the modify, filter and transformer
stages.
7. Job Components (10%)
1. Demonstrate knowledge of Join, Lookup and Merge stages.
1. There is a table in the Parallel Job Developers Guide that compares the three
stages. Try to get to know the differences.
2. Given a scenario, demonstrate knowledge of SORT stage.
1. Read my blog post Sorts to the left of me, sorts to the right to understand all
the different ways to control sorting in a parallel job.
3. Given a scenario, demonstrate understanding of Aggregator stage.
1. Read up on the stage and practice with it.
2. Try doing a sum and a count at the one time! Look for the work around where
you create a column set to the number 1.
4. Describe proper usage of change capture/change apply.
1. Change Capture splits a data stream into inserts, updates, deletes and
unchanged rows by adding a tag field by comparing a before set of data to an
after set of data. Change Apply lets you make those changes to a dataset.
2. I don't really use Change Apply that much since I'm usually applying changes
to a target database table.
5. Demonstrate knowledge of Real-time components.
1. This is kind of vague. I am guessing they are referring to the stages in the
Real Time folder such as XML Input and WISD input. Read up on them and
add them to a job to view the properties.
8. Job Design (10%)
1. Demonstrate knowledge of shared containers.
1. Create a big job and turn part of it into a shared container.
2. Try using that container in two different jobs and examine how job parameters
are shared.
3. Try a shared container that works on a small number of rows and uses column
propagation to pass through the other rows. Use this in multiple jobs.
2. Given a scenario, describe how to minimize SORTS and repartitions.
1. Read my blog post Sorts to the left of me, sorts to the right to understand all
the different ways to control sorting in a parallel job.
2. Add several stages to a job that need sorted data - Remove Duplcates, Join
and Sequential File. Compare what happens to the sort symbols on links
when you configure these stages and choose different key fields.
3. Demonstrate knowledge of creating restart points and methodologies.
1. This one moves into methodology more than tool technical details. Do a
search on www.dsxchange.com for discussions on vertical banding and
rollback.
2. Remember that in most job abort situations you cannot restart a job from
where it left off - it is hard to calculate what row number a parallel
partitioning job was up to and its usually safest to rollback all changes and
start again or overwrite the target file/dataset.
4. Given a scenario, demonstrate proper use of standards.
1. Try reading the DataStage standards chapter from IBM's own Excellent
DataStage Documentation and Examples in New 660 Page IBM RedBook.
5. Explain the process necessary to run multiple copies of the source (job multi-
instance).
1. Not many people use multiple instance jobs with parallel jobs since they are
able to partition data on the fly. Read up on Multiple Instance Jobs in the
DataStage Designer Guide.
6. Demonstrate knowledge of real-time vs. batch job design.
1. Read through the Information Server Services Director just once to get the gist
of turning a DataStage job into a Web Service. The main difference is that a
real time job will tend to input and output data as XML so it can share data
with other SOA services.
9. Monitoring and Troubleshooting (10%)
1. Given a scenario, demonstrate knowledge of parallel job score.
1. See the Advanced Parallel Developers Guide for a description on how to turn
on job scoring.
2. Turn it on and leave it on as you practice for the exam and keep looking at job
scores until you understand how it works.
2. Given a scenario, identify and define environment variables that control DataStage
v.8.0 with regard to added functionality and reporting.
1. DataStage Parallel Job Advanced Developer Guide has a section on project
specific environment variables such as reporting, c++ compile options,
tracing and optimisation.
3. Given a process list (scenario), identify conductor, section leader, and player process.
1. Practice a job on four nodes and watch the scoring to see what happens to
that job when it runs.
4. Given a scenario, identify areas that may improve performance (e.g., buffer size,
repartitioning, config files, operator combination, etc.).
1. Try re-designing a job by adding and removing sorts, remove duplicates,
copies, transformers and database stages and keep an eye on the partitioning
and repartitioning symbols that show up on the job links to see how they are
impacted.
2. Try a job with a lookup stage and a transformer together and put a reject
stage on the transformer. See what happens when the lookup stage doesn't
find a lookup row. Switch combine operators on and off to compare the
difference.
5. Demonstrate knowledge of runtime metadata analysis and performance monitoring.
1. Try out the various DataStage performance reports and use the Monitor tool
to see how a job is progressing.
10. Job Management and Deployment (10%)
1. Demonstrate knowledge of advanced find.
1. Try finding column names, table names and wild card searching of parts of
job names.
2. Given a scenario, demonstrate knowledge and the purpose of impact analysis.
1. Try some impact analysis search and reporting from inside the DataStage
Designer.
3. Demonstrate knowledge and purpose of job compare.
1. Try comparing jobs in different projects that have minor changes.
4. Given a scenario, articulate the change control process.
1. Since IBM do not have a change control process in DataStage 8 this is kind of
a stupid criteria. IBM decommissioned the change control tool from version
7.5, the new change control tool isn't ready yet and an Excel based change
control tool developed internally is not being released to the public. The only
change control in version 8 is the export and import of components manually
between projects.
2. Read up on how to lock a project to be read only. Lock down a project and
see how you can still set job parameter defaults.
11. Job Control and Runtime Management (10%)
1. Demonstrate knowledge of message handlers.
1. Find a warning message and an error message in a DataStage job. Right
mouse click on the message and add it to a job message handler and
downgrade it to a lower level. Re-run the job to see what happens.
2. Read up on the difference between job message handlers and project message
handlers.
2. Identify the use of dsjob command line utility.
1. Read the DataStage Server Job Developers Guide for a description of the
dsjob command.
2. Do a search for dsjob on www.dsxchange.com to see all the sample code on
how DSJob works.
3. Use the DataStage Director to create a new DataStage Batch job - add a job
to the batch code using the combo box provided to generate your own dsjob
code.
3. Given a scenario, demonstrate ability to use job sequencers (e.g., exception hunting,
re-startable, dependencies, passing return value from routing, parameter passing and
job status).
1. Try a scenario where you have several jobs in a row and one of them fails -
see what happens when you restart the sequence job. See what happens when
you reset the sequence job.
2. Compare an aborted sequence job to an aborted parallel job.

You might also like