You are on page 1of 10

Database Design > Primary Indexes

Partitioned Primary Indexes


The partitioned primary index is an extension to the unpartitioned primary index. You can define both single-level
and multilevel partitioning for global temporary and volatile tables, for standard temporal and non-temporal base
tables (but not queue tables or hash indexes), and for non-compressed join indexes.
Note: Column-Partitioned

Tables and Join Indexes

Optimal partitioning expressions are typically coded using CASE_N or RANGE_N expressions based on exact
numeric, character or DateTime columns. DateTime expressions can include the BEGIN and END bound
functions and the DATE, CURRENT_DATE, and CURRENT_TIMESTAMP functions.
The test value you specify with a RANGE_N function must result in a BYTEINT, SMALLINT, INTEGER, BIGINT,
DATE, TIMESTAMP(n), TIMESTAMP(n) WITH TIME ZONE, CHARACTER, VARCHAR, GRAPHIC, or
VARGRAPHIC data type.
Support for the BEGIN and END bound functions also includes support for the IS [NOT] UNTIL_CHANGED and
IS [NOT] UNTIL_CLOSED functions.
If you specify both a BEGIN and END bound function in a partitioning expression, then the conditions only match
when equality exists for both bounds. You can specify any valid Date or Time element for the IS [NOT]
UNTIL_CHANGED and IS [NOT] UNTIL_CLOSED functions in partitioning expressions based on the CASE_N
function.
For recommendations on optimal ways to row-partition the primary index of the various types of temporal tables,
see Temporal Table Support.
Partitioning is designed to optimize range queries while also providing efficient primary index join strategies.
Analyze your range query optimization needs carefully because there are performance tradeoffs between specific
range access improvements and possible decrements for primary index accesses and joins and aggregations on
the primary index that occur as a function of the number of populated row partitions.
When a table or join index is created with partitioning, its rows are hashed to the appropriate AMPs and then
assigned to their computed internal partition number based on the value of the partitioning expressions defined
by the user when the table was created or altered. The evaluation of the partitioning expression determines the
partition number for a row. Once a row has been dispatched to its home AMP, Teradata Database converts the
partition numbers from each partitioning level to an internal partition number that is stored in the rowID.
The value of a PARTITION or PARTITION#Ln is determined by reading the partition number field in the rowID
and then converting that back to the appropriate combined partition number of the specified partitioning level
whenever you submit a query that specifies PARTITION or PARTITION#Ln in its select list.
In other words, the value you see as a PARTITION number when you select the PARTITION (or PARTITION#Ln
for multilevel partitioning) is actually not a column that is stored with the other columns of a table row, but is a
virtual column whose value is determined by decoding the bit pattern stored in the internal partition number of the
rowID.
The partitioning maxima using different types of partitioning expressions are listed in the following table. These
maxima apply to both row-partitioned tables and join indexes and to column-partitioned tables and join indexes.
Partitioning Parameter
Maximum Value

Combined partitions for a table with 2-byte partitioning.

Combined partitions for a table with 8-byte partitioning.

65,535

9,223,372,036,854,775,807

GE MSAT Internal

Valid range for the number of partitioning levels for a table or join
index with 2-byte partitioning.

1 - 15

Valid range for the number of partitioning levels for a table or join
index with 8-byte partitioning.

1 - 62

RANGE_N

65,535

Number of ranges plus NO RANGE [OR UNKNOWN] and


UNKNOWN partitions, with 2-byte partitioning.

RANGE_N

9,223,372,036,854,775,807

Number of ranges plus NO RANGE [OR UNKNOWN] and


UNKNOWN partitions, with 8-byte partitioning.

CASE_N
Number of conditions.

Bound by the request text size and other limits

When you have a partitioned table, the bits stored in the internal partition number of the rowID represent the
combined partition number of the row, which is determined by combining the partitioning expressions of a
PARTITION BY clause into a single expression. The expression that combines the partitioning expressions of
multilevel partitioning into a single combined partition number is called a combined partitioning expression.
The terms internal partition number and combined partition number are defined as follows.
Term
Definition

Internal
partition
number

A value Teradata Database calculates from the combined partition number.


The internal partition number is used to number partitions internally and is stored in the
rowID.
The internal and combined partition numbers can be identical if no modification is needed. A
partitioning expression that consists solely of a CASE_N or RANGE_N function, however
must be modified.
For single-level partitioning only, partitions for the NO RANGE [OR UNKNOWN], NO CASE
[OR UNKNOWN], and UNKNOWN options are placed at fixed internal partitions, followed by
partitions for range and conditions following.
Modification is required to retain existing internal partition numbers after an ALTER TABLE
request that drops or adds ranges or partitions.

Combined partition number

GE MSAT Internal

The value
computed
for the
partitionin
g
expressio
ns for a
row in a
partitione
d table.
For a table with
an
unpartitioned
primary index,
the combined
partition
number is
always 0.
Combining the
results for a
table with
single-level
partitioning, the
combined
partition
number is the
same as the
value of the
single
partitioning
expression.
When you
specify the
system-determi
ned columns
PARTITION or
PARTITION#Ln
in a request,
Teradata
Database
converts the
internal
partition
number stored
in the rowID
into the
appropriate
external
partition
number or the
partition
number of the
specified
partitioning
level and
returns that
value to the
requestor

Once assigned to a row partition, the rows are stored in row hash order, as illustrated by the following graphic,
which is best viewed on a color monitor and best printed on a color printer.

GE MSAT Internal

Teradata Database groups partitioned rows into partition groups on an AMP first by their internal partition number,
then by row hash value within each internal partition, and then by uniqueness value.
The following graphics illustrate the difference between an unpartitioned table and a partitioned table. Both tables
contain the same data for the year 2006. Each colored rectangle represents a row. Rows with the same color are
in the same month. Rows in the same box have the same row hash based on order_number, which is the
primary index column.
For an unpartitioned table, the rows are ordered only by their hash value. For a partitioning that is partitioned by
month, the rows are first ordered by month based on order_date and then ordered by their hash values for the
primary index column. Note that within an internal partition there are fewer rows per hash value.

If a SELECT request specifies values for all the primary index columns, the AMP that contains those rows can be
determined, and only one AMP needs to be accessed. If the query conditions are not specified on the partitioning
columns, then each internal partition can be probed to find the rows based on the hash value, assuming there is
no usable alternative index. If conditions are also specified on the partitioning columns, then row partition

GE MSAT Internal

elimination might further reduce the number of partitions to be probed on that AMP (see SQL Request and
Transaction Processing for information about row partition elimination).
If a SELECT request does not specify the values for all the primary index columns, an all-AMP, full-table scan is
required for a table with no partitioning when there is no usable alternative index. However, with partitioning, if
conditions are specified on the partitioning columns, row partition elimination might reduce what would otherwise
be an all-AMP, full-table scan to an all-AMP scan of only the internal partitions that are not eliminated. The extent
of row partition elimination depends on the partitioning expressions, the conditions specified in the query, and the
ability of the Optimizer to recognize such opportunities.
Suppose a query only requests orders dated August, 2010. For an unpartitioned table, the whole table is read to
determine which rows are from August because the August rows are scattered throughout the table. For the
partitioned table, on the other hand, row partition elimination can be used to exclude all the internal partitions that
do not contain activity for August. Because all the August rows can be grouped together in the partitioned table, it
becomes possible to position directly to the first row for August, then read sequentially until a September row is
found, at which point the system stops reading rows. This query requested about

of the rows, so only about

of the partitioned table must be read. If you partition by day and change the query to make the ending date
August 2nd, an even smaller subset of the table is all that must be read.
When rows for current day activity are inserted later tonight into the unpartitioned table, they are scattered
throughout the entire table, so the average hits per block value is very low. For the partitioned table on the other
hand, the inserts are clustered in a smaller area. This provides a better locality of reference. If the partitioning is
by week or by day, then the inserts are clustered even more tightly, and the hits per block value is very high.
If you want to delete the oldest month of data, the task is a full-table scan for the unpartitioned table, but for the
partitioned table the rows are cluster together, so it is possible to delete them all with a high number of average
hits per block. Transient journaling is not done for each of the rows, further enhancing performance when all the
rows of an internal partition are deleted as the last action of a transaction.
You can see that if the table is partitioned on order date, there is no need for a NUSI on order date, and if you
had one for the unpartitioned table, you could drop it when you converted the primary index for the table to a
partitioned primary index.
You should consider defining a table with partitioning to support either of the following workload characteristics:
A majority of the queries in the workload specify range predicates on some column, particularly a date column, of
the candidate partitioned table.
A majority of the queries in the workload specify an equality predicate on some column of the candidate partitioned
table, and that column is either:
Not the only column in the primary index column set
or
Not a primary index column.
In addition to these two workload characteristics, one of the following sets of characteristics of the primary index
should also be considered. One of these three cases is always true.
The primary index is used:
Primarily or exclusively to distribute rows evenly across the AMPs.
Rarely, if ever, to access rows or to join tables.
To access rows using a condition specified on a column that is suitable for partitioning the table.
or
The primary index, defined with the entire set of partitioning columns, is used:
To distribute rows evenly across the AMPs.
To access rows directly or as a table join condition.
or
The primary index, defined without the entire set of partitioning columns, is used:

GE MSAT Internal

To distribute rows evenly across the AMPs.


To access rows directly or as a table join condition.
In this case, you must pay particular attention to weighing and optimizing the performance tradeoffs the situation
makes possible.
The following factors characterizes the general performance of partitioned tables with respect to unpartitioned
tables.
Partitioned access by means of an equality constraint on all primary index and partitioning columns is as efficient
as the same access made by means of an unpartitioned primary index.
Partitioned access on an equality constraint on the primary index columns and an equality or other constraint on
the partitioning columns that limits access to a single partition, produces performance as efficient as that made
with an unpartitioned primary index.
Partitioned access with a constraint on the partitioning columns can approach that of unpartitioned primary index
access when all the primary index columns are specified with equality constraints on an expression that does not
reference any columns, depending on the extent of row partition elimination (an indirect measure of the number
of internal partitions that must be probed before the desired row is found) required.
Partitioned access via an equality condition on the primary index that neither includes all of the partitioning
columns nor makes a constraint on the partitioning columns might not be as efficient as access by means of an
unpartitioned primary index depending on the number of internal partitions that contain data, but is generally
more efficient than a full-table scan.
Partitioned access when not all of the primary index columns are specified with equality constraints, but there are
conditions specified on the partitioning columns that limit the number of internal partitions that must be scanned
is more efficient than access by an unpartitioned primary index, which requires a full-table scan.
Specifically, these are the major performance characteristics of partitioned tables with respect to unpartitioned
tables.
Inserting a large number of rows into a partitioned table can be much faster than the same insert operation into an
otherwise identical table, the exception being that it is defined without partitioning.
To optimize load performance, the first partitioning expression of the table preferably should match the row arrival
pattern of the load utility, which is most often based on transaction date. In this case, for example, the first
partitioning expression should be based on a DATE or TIMESTAMP column. It is generally, but not always, true
that if you do not match the table partitioning with the arrival pattern of the load utility, you will not achieve much
of an optimization by partitioning the table.
The following design considerations are also important for this partitioning performance characteristic.
Performance improvements decrease as a function of the number of secondary indexes defined on the target table
in a manner similar to the behavior of an unpartitioned table when increasingly more secondary indexes are
defined on it.
Performance improvements are greater as a function of how finely the granularity is defined for the partitions. For
example, partitions defined on weekly intervals are typically better than those defined on monthly intervals.
Accessing rows using conditions on partitioning columns can be faster than the same access to the same rows
when the table is defined with an unpartitioned primary index.
The performance of SELECT, UPDATE, and DELETE operations is improved when queries specify conditions on
the partitioning columns of the table because such predicates cause a higher average number of hits per data
block.
The following design considerations are also important for this partitioned performance characteristic.
The degree of improvement in the performance of select, update, and delete operations is proportional to the
extent of row partition elimination that can be realized (see SQL Request and Transaction Processing for details
about row partition elimination).
In other words, the more internal partitions that can be excluded by conditions specified on the partitioning
columns for a partitioned table, the better.
Performance improvements are generally greater as a function of how finely the granularity is defined for the
internal partitions.

GE MSAT Internal

Depending on the number of internal partitions that contain data, accessing rows by means of an equality
constraint on their primary index can be slower for a partitioned table than for the equivalent unpartitioned table
when you neither include all of the partitioning columns in the primary index definition nor specify a constraint on
those columns in the query.
The following design considerations are important for this partitioned performance characteristic.
The more coarse the granularity of the partitioning you define, or the more internal partitions eliminated, the less
performance degradation you are likely to experience.
Defining an appropriate secondary index on the partitioned table can sometimes minimize the performance
degradation of primary index access.
Joins can be different for partitioned and unpartitioned tables that are otherwise equivalent, but the effect of the
different join strategies that arise cannot be predicted easily in many cases.
The join plan the Optimizer pursues depends on the picture it has of the data demographics based on collected
statistics, dynamic-AMP samples, and derived statistics. The usual recommendation applies here: check
EXPLAIN reports to determine the best way to design your indexes to achieve the optimal join geography.
The following design considerations are important for this partitioned performance characteristic.
Primary index-to-primary index joins are more likely to generate different join plans than other
partitioned-unpartitioned join comparisons.
To minimize the potential for performance issues in making primary index-to-primary index joins, consider the
following guidelines:
Partition the two tables identically if possible.
A coarser granularity of partitions is likely to be superior to a finer partition granularity.
Examine your EXPLAIN reports to determine which join methods the Optimizer is selecting to join the tables.
Rowkey partitioned table joins are generally better than joins based on another family of join methods.
Efficient row partition elimination can often convert joins that would otherwise be poor performers into good
performers.
The most likely candidate for poor join performance is found when you are joining a partitioned table with an
unpartitioned table, the partitioned table partitioning column set is not defined in the unpartitioned table, there are
no predicates on the partitioned table partitioning column set, and there are many internal partitions defined for
the partitioned table.
The need for secondary indexes is often different for partitioned and unpartitioned tables that are otherwise
equivalent.
Several opposing scenarios present themselves for resolving the issues that might arise from the existence or
absence of secondary indexes:
You probably should
IF

drop an existing secondary index on


the partitioning column set of a
partitioning expression

it is not required to enforce the uniqueness of the partitioning


column set of a partitioning expression. In practice, it is unlikely
that uniqueness would be required in this case.
This has the following benefits:
The partitioning itself might provides performance benefits similar to
those realized by the secondary index.
General system performance is enhanced because there is no
need to maintain the secondary index subtable.
Not having a secondary index subtable realizes considerable disk

GE MSAT Internal

savings.

add a new secondary index on the primary index column set

one or
more
colum
ns of
the
partitio
ning
colum
n set
is not
also a
memb
er of
the
primar
y
index
colum
n set
and
the
primar
y
index
values
are
unique
.
In this
situati
on,
you
must
define
a USI
on the
primar
y
index
colum
n set
to
enforc
e its
unique
ness
becau
se you
cannot
define
a
partitio
ned
primar
y
index
to be
unique
unless
its
definiti
on
contai
ns all
of the
partitio
ning
colum
ns.

GE MSAT Internal

In
some
cases,
the
additio
n of
the
USI
results
in
worse
perfor
mance

When you are doing an analysis of whether a table should have partitioning or not, always weigh the costs of a
given strategy set against its benefits carefully.
You should consider the partitioning expression at minimum.
You must consider all of the following factors when making your analysis of a partitioning expression.
Would the proposed workloads against the table be better supported by a partitioning expression based on a
CASE_N, RANGE_N, or some other expression?
Should the partitioning expression specify a NO CASE, NO CASE OR UNKNOWN, NO RANGE, NO RANGE OR
UNKNOWN, or UNKNOWN option?
If the test value in a RANGE_N expression can never be null, there is no need for an UNKNOWN or NO RANGE
OR UNKNOWN partition.
If a condition in a CASE_N expression can never be unknown, there is no need for an UNKNOWN or NO CASE
OR UNKNOWN partition.
Should the table be partitioned on only one level or on multiple levels?
If the partitioning expression specifies CURRENT_DATE functions, CURRENT_TIMESTAMP functions, or both,
how should the expression be configured to minimize problems deriving from reconciling to a new current date or
current timestamp value?
The query workloads that will be accessing the partitioned table.
This factor must be examined at both the specific, or particular, level and at the general, overall level.
Among the factors to be considered are the following:
Performance
Does an unpartitioned table perform better than a partitioned table for the given workload and for particularly
critical queries?
Is one partitioning strategy more high-performing than others?
Do other indexes such as USIs, NUSIs, join indexes, or hash indexes improve performance?
Does a partitioning expression cause significant row partition elimination for queries to occur or not?
Access methods and predicate conditions
Is access to the table typically made by primary index, secondary index, or some other access method?
Do typical queries specify an equality condition on the primary index and include the complete partitioning column
set?
Do typical queries specify a non-equality condition on the primary index or the partitioning columns?
Join strategies
Do typical queries specify an equality condition on the primary index column set (and partitioning column set if they
are not identical)?
Do typical queries specify an equality condition on the primary index column set but not on the partitioning column
set?

GE MSAT Internal

Do typical query conditions support row partition elimination?


Data maintenance
What are the relative effects of a partitioned table versus an unpartitioned table with respect to the maintenance
workload for the table?
If you must define a USI on the primary index to make it unique, how much additional maintenance is required to
update the USI subtable?
Frequency and ease of altering partitions
Is a process in place to ensure that ranges are added and dropped as necessary?
Does the partitioning expression permit you to add and drop ranges?
If the number of rows moved by dropping and adding ranges causes large numbers of rows to be moved, do you
have a process in place to instead create a new table with the desired partitioning in place, then INSERT
SELECT or MERGE the source table rows into the newly created target table with error logging?
If you want to delete rows when their partition is dropped, have you specified NO RANGE, NO RANGE OR
UNKNOWN, or UNKNOWN partitions?
Backup and restore strategies.
See Teradata Archive/Recovery Utility Reference for details of how partitioning might affect your archive and
restore strategies.
See Single-Level Partitioning Case Studies for a set of design scenarios that evaluate the performance effects
of several different partitioned primary indexes.
See Single-AMP Queries and Partitioned Tables and All-AMP Tactical Queries and Partitioned Tables for
specific design issues related to partitioned table support for tactical queries.

Copyright 1998-2013 by Teradata Corporation. All Rights Reserved.

GE MSAT Internal

You might also like