You are on page 1of 4

Data Stage Interview Questions Part 1

Q.What is the flow of loading data into fact & dimensional tables?
A.
1.Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys in
Dimensional table. Consists of fields with numeric values.
2.Dimension table - Table with Unique Primary Key.
3.Load - Data should be first loaded into dimensional table. Based on the primary key values in
dimensional table, the data should be loaded into Fact table.

Q.What is the default cache size? How do you change the cache size if needed?
A. Default cache size is 256 MB. We can increase it by going into Datastage Administrator and
selecting the Tunable Tab and specify the cache size over there.

Q.What are types of Hashed File?


A.
1.Hashed File is classified broadly into 2 types.
1.Static - Sub divided into 17 types based on Primary Key Pattern.
2.Dynamic - sub divided into 2 types
a.Generic
b.Specific.

Dynamic files do not perform as well as a well, designed static file, but do perform better than a
badly designed one. When creating a dynamic file you can specify the following,although all of
these have default values.By Default Hashed file is "Dynamic - Type Random 30 D"

Q.What does a Config File in parallel extender consist of?


A.Config file consists of the following.
1.Number of Processes or Nodes.
2.Actual Disk Storage Location.

Q.What is Modulus and Splitting in Dynamic Hashed File?


A.In a Hashed File, the size of the file keeps changing randomly.If the size of the file increases it
is called as "Modulus".If the size of the file decreases it is called as "Splitting".

Q.What are Stage Variables, Derivations and Constants?


A.
1.Stage Variable - An intermediate processing variable that retains value during read and
doesn’t pass the value into target column.
2.Derivation - Expression that specifies value to be passed on to the target column.
3.Constant - Conditions that are either true or false that specifies flow of data with a link.

Q.What are the types of views in Datastage Director?


A.There are 3 types of views in Datastage Director
1.Job View - Dates of Jobs Compiled.
2.Log View - Status of Job last run
3.Status View - Warning Messages, Event Messages, Program Generated Messages.

1/4
Data Stage Interview Questions Part 1

Q.What are the types of Parallel Processing?


A.Parallel Processing is broadly classified into 2 types.
1.SMP - Symmetrical Multi Processing.
2.MPP - Massive Parallel Processing.

Q.Differtiate between Orchestrate Vs Datastage Parallel Extender?


A.Orchestrate itself is an ETL tool with extensive parallel processing capabilities and running on
UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version of 6.0) to
incorporate the parallel processing capabilities. Now Datastage has purchased Orchestrate and
integrated it with Datastage XE and released a new version Datastage 6.0 i.e Parallel Extender.

Q.What is the importance of Surrogate Key in Data warehousing?


A.Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is
independent of underlying database. i.e. Surrogate Key is not affected by the changes going on
with a database.

Q.How to run a Shell Script within the scope of a Data stage job?
A.By using "ExcecSH" command at Before/After job properties.

Q.How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to


yyyy-dd-mm?
A.We use
1."Iconv" function - Internal Conversion.
2."Oconv" function - External Conversion.
Function to convert mm/dd/yyyy format to yyyy-dd-mm is
Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")

Q.How do you execute datastage job from command line prompt?


A.Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname

Q.What is the functionality of Link Partitioner and Link Collector?


A.
1.Link Partitioner: It actually splits data into various partitions or data flows using various
partition methods.
2.Link Collector: It collects the data coming from partitions, merges it into a single data flow and
loads to target.

Q.What are the Types of Dimensional Modeling?


A.Dimensional modeling is again sub divided into 2 types.
1.Star Schema - Simple & Much Faster. Denormalized form.
2.Snowflake Schema - Complex with more Granularity. More normalized form.

Q.Differentiate Primary Key and Partition Key?


A.Primary Key is a combination of unique and not null. It can be a collection of key values called
as composite primary key. Partition Key is a just a part of Primary Key. There are several

2/4
Data Stage Interview Questions Part 1

methods of partition like Hash, DB2, and Random etc. While using Hash partition we specify the
Partition Key.

Q.Explain Containers Usage and Types?


A.Container is a collection of stages used for the purpose of Reusability.There are 2 types of
Containers.
1.Local Container: Job Specific
2.Shared Container: Used in any job within a project.

Q.Compare and Contrast ODBC and Plug-In stages?


A.
ODBC
1.Poor Performance.
2.Can be used for Variety of Databases.
3.Can handle Stored Procedures.

Plug-In:
1.Good Performance.
2.Database specific. (Only one database)
3.Cannot handle Stored Procedures.

Q.Explain dimension Modelling types along with their significance


A.Data Modelling is Broadly classified into 2 types.
1.E-R Diagrams (Entity - Relatioships).
2.Dimensional Modelling.

Q.Explain Data Stage Architecture?


A.Data Stage contains two components,
1.Client Component.
2.Server Component.

Client Component:
a.Data. Stage Administrator.
b.Data Stage Manager
c.Data Stage Designer
d.Data Stage Director

Server Components:
a.Data Stage Engine
b.Meta Data Repository
c.Package Installer

Q.What is the role of Data Stage Administrator?


A.Data Stage Administrator:
1.Used to create the project.
2.Contains set of properties

3/4
Data Stage Interview Questions Part 1

3.We can set the buffer size (by default 128 MB)
4.We can increase the buffer size.
5.We can set the Environment Variables.
6.In tunable we have in process and inter-process
7.In-process—Data read in sequentially
8.Inter-process— It reads the data as it comes.
9.It just interfaces to metadata.

Q.What is the role of Datastage Manager?


A.Data Stage Manager:
1.We can view and edit the Meta data Repository.
2.We can import table definitions.
3.We can export the Data stage components in .xml or .dsx format.
4.We can create routines and transforms
5.We can compile the multiple jobs.

4/4

You might also like