You are on page 1of 23

| 


 

Center of Excellence
Data Warehousing
Wipro Technologies
 
 
˜ îows to be joined must be on the same AMP.

˜ For join processing, copies of some or all of


the rows may have to be moved to a common
AMP.

˜ Join plans
˜ Product join.
˜ Merge join
˜ Nested join
 
 
˜ eneral scenarios:
˜ Join column is the PI of both the tables.

˜ Join column is PI of one of the tables.

˜ Join column is not a PI of either of the table.


2 
 
˜ îows taking part in the join are already in the
same AMP.

˜ No data movement is necessary.

˜ îows are already in sorted order (within the


block)

˜ This is the best case scenario.


2 
   
˜ ène table has its rows on the target AMP.

˜ îows of the other table need to be


redistributed to their target AMPs by the hash
code of the join column value.

˜ If the table is small optimizer may choose to


duplicate the table on all AMPs
2  
   
˜ îows of both the tables need to redistributed
to their target AMPs by the hash code of the
join column value.

˜ èptimizer might choose to duplicate the


smaller table on all AMPs.

˜ This join scenario involves maximum number


of data movement.
  
˜ èptimizer choose this join strategy when
˜ An equality value for a unique index (UPI or
USI) on table 1.
˜ A join on a column of that single row to any
index on table 2.
˜ This joining uses minimum system resource
         
         
       
    
  
          
          

 
˜ Most general for of join

˜ èptimizer chooses product join in following conditions


˜ WHEîE clause is missing.
˜ Join condition is not based on equality condition.
˜ Join conditions are èîed together.
˜ Table alias are incorrectly used.
˜ èptimizer determines that it is less expensive than
other join types.

˜ Identify the smaller table duplicate it in spool on all


AMPs. Join each spool row of the smaller table to
every row of the larger table.
x  
˜ Commonly done when the join conditions are based on equality.

˜ enerally more efficient than Product Join as number of row


comparisons are less.

˜ Steps
˜ Identify the smaller table.
˜ Put the qualifying rows from one or both table into spool.
˜ Move the spool rows to the AMPs based on join column
hash (if required).
˜ Sort the spool rows by join column hash value (if necessary).
˜ Compare those rows with matching join column hash values.
x  
  # #$%
!"
  #
!" &
& &
&' '
& # '
 & & #
& #
& #
&
@ 
5 5
# # # #  #$$
()* (+)* ()*

&&  ,&& && )

&& - ,&& &&

&&  .&& && 


&&  &&
&& 

/&&  /&& /&&

,&& 0 && ,&& 

.&& 1 && .&& -

2&& 3 /&& 2&& 

4&&  2&&
&&& ' &&
&&& # &&
&&&  &&

&&&  &&
@ 
&&,&&
&&&& &&-,&& &&.&&
2&&3/&& .&&1 && /&&/&& ,&&0&&
&&&' && &&&# && &&& && 4&&2&&

&&&&&

&&)
&& && &&
2&& .&&- /&& ,&&
î   
˜ No distribution needed.

˜ No sorting needed.

˜ Join columns of both the tables are PIs.


˜ îows involved in the join are located in the
same AMP.
2  @ 
 #6+ 5  
7 5
 %# %#

&&,&&
&&&& &&-,&& &&.&&
2&&3/&& .&&1 && /&&/&& ,&&0&&
&&&' && &&&# && &&& && 4&&2&&

&&&&&

&&)
&& && &&
2&& .&&- /&& ,&&
î   
˜ Distributing and sorting one of the table on
join column row hash.

˜ Join column is PI of one of the tables.


˜ ène of the tables is already distributed on join
Column îow Hash.

˜ èptimizer redistributes one of the tables and


sort on join column row hash.
2  @ 
 #6+ 5  
7 5
 %# %#

&&,&&
&&&& &&-,&& &&.&&
2&&3/&& .&&1 && /&&/&& ,&&0&&
&&&' && &&&# && &&& && 4&&2&&

&&&&&

&&)
&& && &&
2&& .&&- /&& ,&&

,&&0&& &&&' &&


4&&2&& &&.&&
&& && &&& &&

&&&&& .&& 1 && 
2&& 3/&& &&&# && 
&&)
&& /&&/&& && -,&&
2&& .&&- &&,&&

&&
/&& &&
,&&
î   
˜ Duplicating and sorting the smaller table on
all AMPs and locally building the larger table
and sorting it.

˜ èptimizer considers this strategy if it finds


redistributing a larger table is more expensive
than duplicating a the smaller table.
2  @ 
&&,&&
&&&& &&-,&& &&.&&
2&&3/&& .&&1 && /&&/&& ,&&0&&
&&&' && &&&# && &&& && 4&&2&&

&&&&&

&&)
&& && &&
2&& .&&- /&& ,&&

&&&' &&
&& && &&& && ,&&0&&
&&,&&
&&&&& /&&/&& &&.&&
2&&3/&& .&&1 && &&-,&& 4&&2&& 
&&&# &&

&&) &&) &&) &&)
&& && && &&
&& && && &&

&&
&&
&&
&&

/&& /&& /&& /&&
,&& ,&& ,&& ,&&
.&&- .&&- .&&- .&&-
2&& 2&& 2&& 2&&
î   
˜ Duplicate the smaller table on every AMP.

˜ èptimizer chooses this strategy the join


condition is not based on equality.

˜ Product join scenario.


@    
˜ Provides an English translation of the steps
chosen by the optimizer.

˜ Very helpful to estimate the performance of


complex queries.

˜ Helps physical designers in their index


selection by providing the execution strategy
chosen by the optimizer.
@    @
!"
˜ enerally EXPLAIN outputs are clear and easy to
understand however it contains few phrases one
needs to be familiar with.
˜ ³«.with no residual conditions«´ : There is no residual
conditions other than the conditions used locate the
row.
˜ ³..eliminating duplicates..´ : DISTINCT operation being
done.
˜ ³«we do a SMS«´ : Set manipulations like UNIèN,
EXCEPT are being done.
˜ ³«we do a BMSMS«´ : NUSI Bit mapping being used.
˜ ³«distributed by hash code to all AMPs«´
˜ ³«duplicated on all AMPs«´
 
˜ èptimizer needs demographic information to create best
execution plan for a query.
˜ Number of rows in the table.
˜ îow size.
˜ Number of rows per value.
˜ Index information and demographics.

˜ Based on the statistics optimizer estimates the cost and creates


the best plan.

˜ Statistics must be collected for the columns and indexes being


accessed frequently.

˜ If Statistics are not provided, optimizer does Dynamic Sampling


(îandom AMP).
X 

You might also like