Professional Documents
Culture Documents
Optimal partitioning expressions are typically coded using CASE_N or RANGE_N expressions based on exact
numeric, character or DateTime columns. DateTime expressions can include the BEGIN and END bound
functions and the DATE, CURRENT_DATE, and CURRENT_TIMESTAMP functions.
The test value you specify with a RANGE_N function must result in a BYTEINT, SMALLINT, INTEGER, BIGINT,
DATE, TIMESTAMP(n), TIMESTAMP(n) WITH TIME ZONE, CHARACTER, VARCHAR, GRAPHIC, or
VARGRAPHIC data type.
Support for the BEGIN and END bound functions also includes support for the IS [NOT] UNTIL_CHANGED and
IS [NOT] UNTIL_CLOSED functions.
If you specify both a BEGIN and END bound function in a partitioning expression, then the conditions only match
when equality exists for both bounds. You can specify any valid Date or Time element for the IS [NOT]
UNTIL_CHANGED and IS [NOT] UNTIL_CLOSED functions in partitioning expressions based on the CASE_N
function.
For recommendations on optimal ways to row-partition the primary index of the various types of temporal tables,
see Temporal Table Support.
Partitioning is designed to optimize range queries while also providing efficient primary index join strategies.
Analyze your range query optimization needs carefully because there are performance tradeoffs between specific
range access improvements and possible decrements for primary index accesses and joins and aggregations on
the primary index that occur as a function of the number of populated row partitions.
When a table or join index is created with partitioning, its rows are hashed to the appropriate AMPs and then
assigned to their computed internal partition number based on the value of the partitioning expressions defined
by the user when the table was created or altered. The evaluation of the partitioning expression determines the
partition number for a row. Once a row has been dispatched to its home AMP, Teradata Database converts the
partition numbers from each partitioning level to an internal partition number that is stored in the rowID.
The value of a PARTITION or PARTITION#Ln is determined by reading the partition number field in the rowID
and then converting that back to the appropriate combined partition number of the specified partitioning level
whenever you submit a query that specifies PARTITION or PARTITION#Ln in its select list.
In other words, the value you see as a PARTITION number when you select the PARTITION (or PARTITION#Ln
for multilevel partitioning) is actually not a column that is stored with the other columns of a table row, but is a
virtual column whose value is determined by decoding the bit pattern stored in the internal partition number of the
rowID.
The partitioning maxima using different types of partitioning expressions are listed in the following table. These
maxima apply to both row-partitioned tables and join indexes and to column-partitioned tables and join indexes.
Partitioning Parameter
Maximum Value
65,535
9,223,372,036,854,775,807
GE MSAT Internal
Valid range for the number of partitioning levels for a table or join
index with 2-byte partitioning.
1 - 15
Valid range for the number of partitioning levels for a table or join
index with 8-byte partitioning.
1 - 62
RANGE_N
65,535
RANGE_N
9,223,372,036,854,775,807
CASE_N
Number of conditions.
When you have a partitioned table, the bits stored in the internal partition number of the rowID represent the
combined partition number of the row, which is determined by combining the partitioning expressions of a
PARTITION BY clause into a single expression. The expression that combines the partitioning expressions of
multilevel partitioning into a single combined partition number is called a combined partitioning expression.
The terms internal partition number and combined partition number are defined as follows.
Term
Definition
Internal
partition
number
GE MSAT Internal
The value
computed
for the
partitionin
g
expressio
ns for a
row in a
partitione
d table.
For a table with
an
unpartitioned
primary index,
the combined
partition
number is
always 0.
Combining the
results for a
table with
single-level
partitioning, the
combined
partition
number is the
same as the
value of the
single
partitioning
expression.
When you
specify the
system-determi
ned columns
PARTITION or
PARTITION#Ln
in a request,
Teradata
Database
converts the
internal
partition
number stored
in the rowID
into the
appropriate
external
partition
number or the
partition
number of the
specified
partitioning
level and
returns that
value to the
requestor
Once assigned to a row partition, the rows are stored in row hash order, as illustrated by the following graphic,
which is best viewed on a color monitor and best printed on a color printer.
GE MSAT Internal
Teradata Database groups partitioned rows into partition groups on an AMP first by their internal partition number,
then by row hash value within each internal partition, and then by uniqueness value.
The following graphics illustrate the difference between an unpartitioned table and a partitioned table. Both tables
contain the same data for the year 2006. Each colored rectangle represents a row. Rows with the same color are
in the same month. Rows in the same box have the same row hash based on order_number, which is the
primary index column.
For an unpartitioned table, the rows are ordered only by their hash value. For a partitioning that is partitioned by
month, the rows are first ordered by month based on order_date and then ordered by their hash values for the
primary index column. Note that within an internal partition there are fewer rows per hash value.
If a SELECT request specifies values for all the primary index columns, the AMP that contains those rows can be
determined, and only one AMP needs to be accessed. If the query conditions are not specified on the partitioning
columns, then each internal partition can be probed to find the rows based on the hash value, assuming there is
no usable alternative index. If conditions are also specified on the partitioning columns, then row partition
GE MSAT Internal
elimination might further reduce the number of partitions to be probed on that AMP (see SQL Request and
Transaction Processing for information about row partition elimination).
If a SELECT request does not specify the values for all the primary index columns, an all-AMP, full-table scan is
required for a table with no partitioning when there is no usable alternative index. However, with partitioning, if
conditions are specified on the partitioning columns, row partition elimination might reduce what would otherwise
be an all-AMP, full-table scan to an all-AMP scan of only the internal partitions that are not eliminated. The extent
of row partition elimination depends on the partitioning expressions, the conditions specified in the query, and the
ability of the Optimizer to recognize such opportunities.
Suppose a query only requests orders dated August, 2010. For an unpartitioned table, the whole table is read to
determine which rows are from August because the August rows are scattered throughout the table. For the
partitioned table, on the other hand, row partition elimination can be used to exclude all the internal partitions that
do not contain activity for August. Because all the August rows can be grouped together in the partitioned table, it
becomes possible to position directly to the first row for August, then read sequentially until a September row is
found, at which point the system stops reading rows. This query requested about
of the partitioned table must be read. If you partition by day and change the query to make the ending date
August 2nd, an even smaller subset of the table is all that must be read.
When rows for current day activity are inserted later tonight into the unpartitioned table, they are scattered
throughout the entire table, so the average hits per block value is very low. For the partitioned table on the other
hand, the inserts are clustered in a smaller area. This provides a better locality of reference. If the partitioning is
by week or by day, then the inserts are clustered even more tightly, and the hits per block value is very high.
If you want to delete the oldest month of data, the task is a full-table scan for the unpartitioned table, but for the
partitioned table the rows are cluster together, so it is possible to delete them all with a high number of average
hits per block. Transient journaling is not done for each of the rows, further enhancing performance when all the
rows of an internal partition are deleted as the last action of a transaction.
You can see that if the table is partitioned on order date, there is no need for a NUSI on order date, and if you
had one for the unpartitioned table, you could drop it when you converted the primary index for the table to a
partitioned primary index.
You should consider defining a table with partitioning to support either of the following workload characteristics:
A majority of the queries in the workload specify range predicates on some column, particularly a date column, of
the candidate partitioned table.
A majority of the queries in the workload specify an equality predicate on some column of the candidate partitioned
table, and that column is either:
Not the only column in the primary index column set
or
Not a primary index column.
In addition to these two workload characteristics, one of the following sets of characteristics of the primary index
should also be considered. One of these three cases is always true.
The primary index is used:
Primarily or exclusively to distribute rows evenly across the AMPs.
Rarely, if ever, to access rows or to join tables.
To access rows using a condition specified on a column that is suitable for partitioning the table.
or
The primary index, defined with the entire set of partitioning columns, is used:
To distribute rows evenly across the AMPs.
To access rows directly or as a table join condition.
or
The primary index, defined without the entire set of partitioning columns, is used:
GE MSAT Internal
GE MSAT Internal
Depending on the number of internal partitions that contain data, accessing rows by means of an equality
constraint on their primary index can be slower for a partitioned table than for the equivalent unpartitioned table
when you neither include all of the partitioning columns in the primary index definition nor specify a constraint on
those columns in the query.
The following design considerations are important for this partitioned performance characteristic.
The more coarse the granularity of the partitioning you define, or the more internal partitions eliminated, the less
performance degradation you are likely to experience.
Defining an appropriate secondary index on the partitioned table can sometimes minimize the performance
degradation of primary index access.
Joins can be different for partitioned and unpartitioned tables that are otherwise equivalent, but the effect of the
different join strategies that arise cannot be predicted easily in many cases.
The join plan the Optimizer pursues depends on the picture it has of the data demographics based on collected
statistics, dynamic-AMP samples, and derived statistics. The usual recommendation applies here: check
EXPLAIN reports to determine the best way to design your indexes to achieve the optimal join geography.
The following design considerations are important for this partitioned performance characteristic.
Primary index-to-primary index joins are more likely to generate different join plans than other
partitioned-unpartitioned join comparisons.
To minimize the potential for performance issues in making primary index-to-primary index joins, consider the
following guidelines:
Partition the two tables identically if possible.
A coarser granularity of partitions is likely to be superior to a finer partition granularity.
Examine your EXPLAIN reports to determine which join methods the Optimizer is selecting to join the tables.
Rowkey partitioned table joins are generally better than joins based on another family of join methods.
Efficient row partition elimination can often convert joins that would otherwise be poor performers into good
performers.
The most likely candidate for poor join performance is found when you are joining a partitioned table with an
unpartitioned table, the partitioned table partitioning column set is not defined in the unpartitioned table, there are
no predicates on the partitioned table partitioning column set, and there are many internal partitions defined for
the partitioned table.
The need for secondary indexes is often different for partitioned and unpartitioned tables that are otherwise
equivalent.
Several opposing scenarios present themselves for resolving the issues that might arise from the existence or
absence of secondary indexes:
You probably should
IF
GE MSAT Internal
savings.
one or
more
colum
ns of
the
partitio
ning
colum
n set
is not
also a
memb
er of
the
primar
y
index
colum
n set
and
the
primar
y
index
values
are
unique
.
In this
situati
on,
you
must
define
a USI
on the
primar
y
index
colum
n set
to
enforc
e its
unique
ness
becau
se you
cannot
define
a
partitio
ned
primar
y
index
to be
unique
unless
its
definiti
on
contai
ns all
of the
partitio
ning
colum
ns.
GE MSAT Internal
In
some
cases,
the
additio
n of
the
USI
results
in
worse
perfor
mance
When you are doing an analysis of whether a table should have partitioning or not, always weigh the costs of a
given strategy set against its benefits carefully.
You should consider the partitioning expression at minimum.
You must consider all of the following factors when making your analysis of a partitioning expression.
Would the proposed workloads against the table be better supported by a partitioning expression based on a
CASE_N, RANGE_N, or some other expression?
Should the partitioning expression specify a NO CASE, NO CASE OR UNKNOWN, NO RANGE, NO RANGE OR
UNKNOWN, or UNKNOWN option?
If the test value in a RANGE_N expression can never be null, there is no need for an UNKNOWN or NO RANGE
OR UNKNOWN partition.
If a condition in a CASE_N expression can never be unknown, there is no need for an UNKNOWN or NO CASE
OR UNKNOWN partition.
Should the table be partitioned on only one level or on multiple levels?
If the partitioning expression specifies CURRENT_DATE functions, CURRENT_TIMESTAMP functions, or both,
how should the expression be configured to minimize problems deriving from reconciling to a new current date or
current timestamp value?
The query workloads that will be accessing the partitioned table.
This factor must be examined at both the specific, or particular, level and at the general, overall level.
Among the factors to be considered are the following:
Performance
Does an unpartitioned table perform better than a partitioned table for the given workload and for particularly
critical queries?
Is one partitioning strategy more high-performing than others?
Do other indexes such as USIs, NUSIs, join indexes, or hash indexes improve performance?
Does a partitioning expression cause significant row partition elimination for queries to occur or not?
Access methods and predicate conditions
Is access to the table typically made by primary index, secondary index, or some other access method?
Do typical queries specify an equality condition on the primary index and include the complete partitioning column
set?
Do typical queries specify a non-equality condition on the primary index or the partitioning columns?
Join strategies
Do typical queries specify an equality condition on the primary index column set (and partitioning column set if they
are not identical)?
Do typical queries specify an equality condition on the primary index column set but not on the partitioning column
set?
GE MSAT Internal
GE MSAT Internal