Teradata Join Processing

|

Center of Excellence
Data Warehousing
Wipro Technologies

îows to be joined must be on the same AMP.
For join processing, copies of some or all of

the rows may have to be moved to a common
AMP.
Join plans
Product join.
Merge join
Nested join

eneral scenarios:
Join column is the PI of both the tables.
Join column is PI of one of the tables.
Join column is not a PI of either of the table.

2

îows taking part in the join are already in the
same AMP.
No data movement is necessary.
îows are already in sorted order (within the

block)
This is the best case scenario.

2

ène table has its rows on the target AMP.
îows of the other table need to be

redistributed to their target AMPs by the hash
code of the join column value.
If the table is small optimizer may choose to

duplicate the table on all AMPs
2

îows of both the tables need to redistributed
to their target AMPs by the hash code of the
join column value.
èptimizer might choose to duplicate the

smaller table on all AMPs.
This join scenario involves maximum number

of data movement.

èptimizer choose this join strategy when
An equality value for a unique index (UPI or
USI) on table 1.
A join on a column of that single row to any
index on table 2.
This joining uses minimum system resource

Most general for of join
èptimizer chooses product join in following conditions

WHEîE clause is missing.
Join condition is not based on equality condition.
Join conditions are èîed together.
Table alias are incorrectly used.
èptimizer determines that it is less expensive than
other join types.
Identify the smaller table duplicate it in spool on all

AMPs. Join each spool row of the smaller table to
every row of the larger table.
x
Commonly done when the join conditions are based on equality.
enerally more efficient than Product Join as number of row

comparisons are less.
Steps
Identify the smaller table.
Put the qualifying rows from one or both table into spool.
Move the spool rows to the AMPs based on join column
hash (if required).
Sort the spool rows by join column hash value (if necessary).
Compare those rows with matching join column hash values.
x
# #$%
!"
#
!" &
& &
&' '
& # '
& & #
& #
& #
&
@
5 5
# # # # #$$
()* (+)* ()*
&& ,&& && )
&& - ,&& &&
&& .&& &&

&& &&
&&
/&& /&& /&&
,&& 0 && ,&&
.&& 1 && .&& -
2&& 3 /&& 2&&
4&& 2&&
&&& ' &&
&&& # &&
&&& &&

&&& &&
@
&&,&&
&&&& &&-,&& &&.&&
2&&3/&& .&&1 && /&&/&& ,&&0&&
&&&' && &&&# && &&& && 4&&2&&

&&&&&
&&)
&& && &&
2&& .&&- /&& ,&&
î
No distribution needed.
No sorting needed.
Join columns of both the tables are PIs.

îows involved in the join are located in the
same AMP.
2 @
#6+ 5
7 5
%#%#
&&,&&
&&&& &&-,&& &&.&&
2&&3/&& .&&1 && /&&/&& ,&&0&&
&&&' && &&&# && &&& && 4&&2&&

&&&&&
&&)
&& && &&
2&& .&&- /&& ,&&
î
Distributing and sorting one of the table on
join column row hash.
Join column is PI of one of the tables.

ène of the tables is already distributed on join
Column îow Hash.
èptimizer redistributes one of the tables and

sort on join column row hash.
2 @
#6+ 5
7 5
%# %#
&&,&&
&&&& &&-,&& &&.&&
2&&3/&& .&&1 && /&&/&& ,&&0&&
&&&' && &&&# && &&& && 4&&2&&

&&&&&
&&)
&& && &&
2&& .&&- /&& ,&&
,&&0&& &&&' &&

4&&2&& &&.&&
&& && &&& &&

&&&&& .&& 1 &&
2&& 3/&& &&&# &&
&&)
&& /&&/&& && -,&&
2&& .&&- &&,&&

&&
/&& &&
,&&
î
Duplicating and sorting the smaller table on
all AMPs and locally building the larger table
and sorting it.
èptimizer considers this strategy if it finds

redistributing a larger table is more expensive
than duplicating a the smaller table.
2 @
&&,&&
&&&& &&-,&& &&.&&
2&&3/&& .&&1 && /&&/&& ,&&0&&
&&&' && &&&# && &&& && 4&&2&&

&&&&&
&&)
&& && &&
2&& .&&- /&& ,&&
&&&' &&
&& && &&& && ,&&0&&
&&,&&
&&&&& /&&/&& &&.&&
2&&3/&& .&&1 && &&-,&& 4&&2&&
&&&# &&

&&) &&) &&) &&)
&& && && &&
&& && && &&
&&
&&
&&
&&

/&& /&& /&& /&&
,&& ,&& ,&& ,&&
.&&- .&&- .&&- .&&-
2&& 2&& 2&& 2&&
î
Duplicate the smaller table on every AMP.
èptimizer chooses this strategy the join

condition is not based on equality.
Product join scenario.

@
Provides an English translation of the steps
chosen by the optimizer.
Very helpful to estimate the performance of

complex queries.
Helps physical designers in their index

selection by providing the execution strategy
chosen by the optimizer.
@ @
!"
enerally EXPLAIN outputs are clear and easy to
understand however it contains few phrases one
needs to be familiar with.
³«.with no residual conditions«´ : There is no residual
conditions other than the conditions used locate the
row.
³..eliminating duplicates..´ : DISTINCT operation being
done.
³«we do a SMS«´ : Set manipulations like UNIèN,
EXCEPT are being done.
³«we do a BMSMS«´ : NUSI Bit mapping being used.
³«distributed by hash code to all AMPs«´
³«duplicated on all AMPs«´

èptimizer needs demographic information to create best
execution plan for a query.
Number of rows in the table.
îow size.
Number of rows per value.
Index information and demographics.
Based on the statistics optimizer estimates the cost and creates

the best plan.
Statistics must be collected for the columns and indexes being

accessed frequently.
If Statistics are not provided, optimizer does Dynamic Sampling

(îandom AMP).
X

Teradata Join Processing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teradata Join Processing

Uploaded by

Copyright:

Available Formats

| 

 For join processing, copies of some or all of

 Join column is PI of one of the tables.

 Join column is not a PI of either of the table.

 No data movement is necessary.

 îows are already in sorted order (within the

 This is the best case scenario.

 îows of the other table need to be

 If the table is small optimizer may choose to

 èptimizer might choose to duplicate the

 This join scenario involves maximum number

 èptimizer chooses product join in following conditions

 Identify the smaller table duplicate it in spool on all

 enerally more efficient than Product Join as number of row

&&  ,&& && )

&& - ,&& &&

&&  .&& && 

/&&  /&& /&&

,&& 0 && ,&& 

.&& 1 && .&& -

2&& 3 /&& 2&& 

 Join columns of both the tables are PIs.

 Join column is PI of one of the tables.

 èptimizer redistributes one of the tables and

,&&0&& &&&' &&

 èptimizer considers this strategy if it finds

 èptimizer chooses this strategy the join

 Product join scenario.

 Very helpful to estimate the performance of

 Helps physical designers in their index

 Based on the statistics optimizer estimates the cost and creates

 Statistics must be collected for the columns and indexes being

 If Statistics are not provided, optimizer does Dynamic Sampling

You might also like

|

For join processing, copies of some or all of

Join column is PI of one of the tables.

Join column is not a PI of either of the table.

No data movement is necessary.

îows are already in sorted order (within the

This is the best case scenario.

îows of the other table need to be

If the table is small optimizer may choose to

èptimizer might choose to duplicate the

This join scenario involves maximum number

èptimizer chooses product join in following conditions

Identify the smaller table duplicate it in spool on all

enerally more efficient than Product Join as number of row

&& ,&& && )

&& .&& &&

/&& /&& /&&

,&& 0 && ,&&

2&& 3 /&& 2&&

Join columns of both the tables are PIs.

Join column is PI of one of the tables.

èptimizer redistributes one of the tables and

,&&0&& &&&' &&

èptimizer considers this strategy if it finds

èptimizer chooses this strategy the join

Product join scenario.

Very helpful to estimate the performance of

Helps physical designers in their index

Based on the statistics optimizer estimates the cost and creates

Statistics must be collected for the columns and indexes being

If Statistics are not provided, optimizer does Dynamic Sampling