Professional Documents
Culture Documents
FRH3Y5SVPL2W
GE Internal
Statistics are used to quantify the data distribution and storage characteristics of tables, columns,
indexes and partitions. The COB uses these statistics to estimate how much I/O and memory are
required to execute a SQL statement using a particular execution plan. Statistics are stored in the
data dictionary, and they can be exported from one database and imported into another.
Situations in where you would want to perform this, might be to transfer production statistics to a
test system to simulate the real environment, even though the test system may only have small
samples of the data.
In order to give the Oracle cost-based optimizer the most up-to-date information about schema
objects (and the best chance for choosing a good execution plan) all application tables and
indexes to be accessed must be analyzed. New statistics should be gathered on schema objects
that are out of date. After loading or deleting large amounts of data would obviously change the
number of rows. Other changes like updating a large amount of rows would not effect the
number of rows, but may effect the average row length.
Statistics can be generated with the ANALYZE statement or with the package DBMS_STATS
(introduced in Oracle8i). The DBMS_STATS package is great for DBA's in managing database
statistics only for use by the COB. The package itself allows the DBA to create, modify, view
and delete statistics from a standard, well-defined set of package procedures. The statistics can
be gathered on tables, indexes, columns, partitions and schemas, but note that it does not
generate statistics for clusters.
provides a mechanism for you to view and modify optimizer statistics gathered for
database objects.The statistics can reside in two different locations:
DBMS_STATS
The dictionary.
Only statistics stored in the dictionary itself have an impact on the cost-based optimizer.
When you generate statistics for a table, column, or index, if the data dictionary already contains
statistics for the object, then Oracle updates the existing statistics. Oracle also invalidates any
currently parsed SQL statements that access the object.
The next time such a statement executes, the optimizer automatically chooses a new execution
plan based on the new statistics. Distributed statements issued on remote databases that access
the analyzed objects use the new statistics the next time Oracle parses them.
When you associate a statistics type with a column or domain index, Oracle calls the statistics
collection method in the statistics type if you analyze the column or domain index.
Missing statistics
When statistics do not exist on schema objects, the optimizer uses the following default values.
GE Internal
Tables
Statistic
Cardinality
100 rows
20 bytes
No. of blocks
100
Remote cardinality
2000 rows
100 bytes
Indexes
Statistic
Levels
Leaf blocks
25
Leaf blocks/key
Data blocks/key
Distinct keys
100
Clustering factor
Analyze
o The only method available for collecting statistics in Oracle 8.0 and lower.
o ANALYZE can only run serially.
o ANALYZE cannot overwrite or delete certain types of statistics that where
generated by DBMS_STATS.
o ANALYZE calculates global statistics for partitioned tables and indexes
instead of gathering them directly. This can lead to inaccuracies for some
statistics, such as the number of distinct values.
GE Internal
DBMS_STATS
o Only available for Oracle 8i and higher.
o Statistics can be generated to a statistics table and can then be imported or
exported between databases and re-loaded into the data dictionary at any
time. This allows the DBA to experiment with various statistics.
o DBMS_STATS routines have the option to run via parallel query or operate
serially.
o Can gather statistics for sub-partitions or partitions.
o Certain DDL commands (ie. create index) automatically generate
statistics, therefore eliminating the need to generate statistics explicitly
after DDL command.
o DBMS_STATS does not generate information about chained rows and the
structural integrity of segments.
o The DBA can set a particular table, a whole schema or the entire database
to be automatically monitored when a modification occurs. When enabled,
any change (insert, update, delete, direct load, truncate, etc.) that occurs on
a table will be tracked in the SGA. This information is incorporated into
the data dictionary by the SMON process at a pre-set interval (every 3
hours in Oracle 8.1.x, and every 15 minutes in Oracle 9i). The information
collected by this monitoring can be seen in the DBA_TAB_MODIFICATIONS
view. Oracle 9i introduced a new function in the DBMS_STATS package
called: FLUSH_DATABASE_MONITORING_INFO. The DBA can make use of
this function to flush the monitored table data more frequently. Oracle 9i
will also automatically call this procedure prior to executing DBMS_STATS
for statistics gathering purposes. Note that this function is not included
with Oracle 8i.
o DBMS_STATS provides a more efficient, scalable solution for statistics
gathering and should be used over the traditional ANALYZE command
GE Internal
which does not support features such as parallelism and stale statistics
collection.
o Use of table monitoring in conjunction with DBMS_STATS stale object
statistics generation is highly recommended for environments with large,
random and/or sporadic data changes. These features allow the database to
more efficiently determine which tables should be re-analyzed versus the
DBA having to force statistics collection for all tables. Including those
that have not changed enough to merit a re-scan)
What gets collected?
Table Statistics
Oracle collects the following statistics for a table. Statistics marked with an asterisk are always
computed exactly. Table statistics, including the status of domain indexes, appear in the data
dictionary views USER_TABLES, ALL_TABLES, and DBA_TABLES in the columns shown in
parentheses.
* Number of data blocks below the high water mark (that is, the number of
data blocks that have been formatted to receive data, regardless whether
they currently contain data or are empty) (BLOCKS)
* Number of data blocks allocated to the table that have never been used
(EMPTY_BLOCKS)
Index Statistics
Oracle collects the following statistics for an index. Statistics marked with an asterisk are always
computed exactly. For conventional indexes, the statistics appear in the data dictionary views
USER_INDEXES, ALL_INDEXES, and DBA_INDEXES in the columns in parentheses.
* Depth of the index from its root block to its leaf blocks (BLEVEL)
GE Internal
Average number of data blocks per index value (for an index on a table)
(AVG_DATA_BLOCKS_PER_KEY)
Clustering factor (how well ordered the rows are about the indexed values)
(CLUSTERING_FACTOR)
Statistics
Statistics
Statistics
Statistics
(*)
(**)
(G)
(GP)
Blocks
GE Internal
CHAIN_CNT
AVG_ROW_LEN
AVG_SPACE_FREELIST_BLOCKS (*)(G)
freelist
NUM_FREELIST_BLOCKS
(*)(G)
SAMPLE_SIZE
COMPUTE)
LAST_ANALYZED
GLOBAL_STATS
(**)
statistics
statistics
subpartitions
USER_STATS
user
single
SAMPLE_SIZE
COMPUTE)
LAST_ANALYZED
GLOBAL_STATS
statistics
subpartitions
GE Internal
USER_STATS
user
PCT_DIRECT_ACCESS
:
:
:
:
:
:
:
:
When computing statistics, an entire object is scanned to gather data about the object. This data
is used by Oracle to compute exact statistics about the object. Slight variances throughout the
object are accounted for in these computed statistics. Because an entire object is scanned to
GE Internal
gather information for computed statistics, the larger the size of an object, the more work that is
required to gather the necessary information.
To perform an exact computation, Oracle requires enough space to perform a scan and sort of the
table. If there is not enough space in memory, then temporary space may be required. For
estimations, Oracle requires enough space to perform a scan and sort of only the rows in the
requested sample of the table. For indexes, computation does not take up as much time or space,
so it is best to perform a full computation.
Some statistics are always computed exactly, such as the number of data blocks currently
containing data in a table or the depth of an index from its root block to its leaf blocks.
Use estimation for tables and clusters rather than computation, unless you need exact values.
Because estimation rarely sorts, it is often much faster than computation, especially for large
tables.
ESTIMATE STATISTICS
instructs Oracle to estimate statistics about the analyzed object and
stores them in the data dictionary.
ESTIMATE STATISTICS
When estimating statistics, Oracle gathers representative information from portions of an object.
This subset of information provides reasonable, estimated statistics about the object. The
accuracy of estimated statistics depends upon how representative the sampling used by Oracle is.
Only parts of an object are scanned to gather information for estimated statistics, so an object can
be analyzed quickly. You can optionally specify the number or percentage of rows that Oracle
should use in making the estimate.
To estimate statistics, Oracle selects a random sample of data. You can specify the sampling
percentage and whether sampling should be based on rows or blocks.
Block sampling reads a random sample of blocks and uses all of the rows
in those blocks for estimates. This reduces the amount of I/O activity for a
given sample size, but it can reduce the randomness of the sample if rows
are not randomly distributed on disk. Block sampling is not available for
index statistics.
GE Internal
The default estimate of the analyze command reads the first approx 1064
rows of the table so the results often leave a lot to be desired.
The general consensus is that the default value of 1064 is not sufficient for
accurate statistics when dealing with tables of any size. Many claims have
shown that estimating statistics on 30 percent produces very accurate
results. I personally have been running estimate 35 percent. This seems to
produce very accurate numbers. It also saves a lot of time over full scans.
Note that if an estimate does 50% or more of a table Oracle converts the
estimate to a full compute statistics.
stattab : Name of the table to create. This value should be passed as the stattab
parameter to other procedures when the user does not want to modify the
dictionary statistics directly.
GE Internal
tblspace : Tablespace in which to create the stat tables. If none is specified, then
they are created in the user's default tablespace.
NULL,
FALSE,
'FOR ALL COLUMNS SIZE 1',
NULL,
'DEFAULT',
FALSE,
NULL,
NULL,
'GATHER',
NULL);
method_opt : Method options of the following format (the phrase 'SIZE 1' is
required to ensure gathering statistics in parallel and for use with the phrase
hidden):
FOR ALL [INDEXED | HIDDEN] COLUMNS [SIZE integer]
GE Internal
stattab : User stat table identifier describing where to save the current statistics.
GE Internal
DBMS_STATS.export_schema_stats (
ownname VARCHAR2,
stattab VARCHAR2,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL);
stattab : User stat table identifier describing where to store the statistics.
stattab : User stat table identifier describing from where to retrieve the statistics.
stattab : User stat table identifier describing from where to delete the statistics. If
stattab is NULL, then the statistics are deleted directly in the dictionary.
statid : Identifier (optional) to associate with these statistics within stattab (Only
pertinent if stattab is not NULL).
GE Internal
partname : Name of the table partition in which to store the statistics. If the table
is partitioned and partname is NULL, then the statistics are stored at the global
table level.
stattab : User stat table identifier describing where to store the statistics. If stattab
is NULL, then the statistics are stored directly in the dictionary.
statid : Identifier (optional) to associate with these statistics within stattab (Only
pertinent if stattab is not NULL).
GE Internal
partname : Name of the table partition from which to get the statistics. If the
table is partitioned and if partname is NULL, then the statistics are retrieved from
the global table level.
stattab : User stat table identifier describing from where to retrieve the statistics.
If stattab is NULL, then the statistics are retrieved directly from the dictionary.
statid : Identifier (optional) to associate with these statistics within stattab (Only
pertinent if stattab is not NULL).
NULL,
NULL,
NULL,
NULL);
GE Internal
partname : Name of the index partition for which to get the statistics. If the index
is partitioned and if partname is NULL, then the statistics are retrieved for the
global index level.
stattab : User stat table identifier describing from where to retrieve the statistics.
If stattab is NULL, then the statistics are retrieved directly from the dictionary.
statid : Identifier (optional) to associate with these statistics within stattab (Only
pertinent if stattab is not NULL).
avglblk : Average integral number of leaf blocks in which each distinct key
appears for this index (partition).
GE Internal
BEGIN
DBMS_STATS.gather_schema_stats (
ownname
=> 'scott',
estimate_percent => null,
block_sample
=> false,
method_opt
=> 'FOR ALL COLUMNS SIZE 1',
degree
=> null,
granularity
=> 'ALL',
cascade
=> true,
options
=> 'GATHER');
END;
/
BEGIN
DBMS_STATS.delete_schema_stats ('scott');
END;
/
BEGIN
DBMS_STATS.delete_schema_stats (
ownname => 'scott',
stattab => 'stats_table_backup',
GE Internal
statid
statown
=> 'BACKUP_TEST1',
=> 'scott');
END;
/
BEGIN
DBMS_STATS.get_table_stats (
'scott',
'emp',
NUMROWS=>:numrows,
NUMBLKS=>:numblks,
AVGRLEN=>:avgrlen);
END;
/
PL/SQL procedure successfully completed.
SQL> print NUMROWS NUMBLKS AVGRLEN
NUMROWS
---------1000
NUMBLKS
----------
GE Internal
28
AVGRLEN
---------92
variable
variable
variable
variable
variable
variable
variable
NUMROWS number
NUMLBLKS number
NUMDIST number
AVGLBLK number
AVGDBLK number
CLSTFCT number
INDLEVEL number
BEGIN
DBMS_STATS.get_index_stats (
'SCOTT',
'EMP_PK',
NUMROWS
=> :NUMROWS,
NUMLBLKS => :NUMLBLKS,
NUMDIST
=> :NUMDIST,
AVGLBLK
=> :AVGLBLK,
AVGDBLK
=> :AVGDBLK,
CLSTFCT
=> :CLSTFCT,
INDLEVEL => :INDLEVEL);
END;
/
PL/SQL procedure successfully completed.
SQL> print NUMROWS NUMLBLKS NUMDIST AVGLBLK AVGDBLK CLSTFCT INDLEVEL
NUMROWS
---------1000
NUMLBLKS
---------3
NUMDIST
---------1000
AVGLBLK
---------1
AVGDBLK
---------1
CLSTFCT
---------15
GE Internal
INDLEVEL
---------1
The objlist parameter identifies an output parameter for the LIST STALE and LIST EMPTY
options. The objlist parameter is of type DBMS_STATS.OBJECTTAB.
Step 1 : Perform a quick analyze to load in base statistics
BEGIN
DBMS_STATS.GATHER_SCHEMA_STATS (
ownname
=> 'scott',
estimate_percent => null,
block_sample
=> false,
method_opt
=> 'FOR ALL COLUMNS',
degree
=> null,
example
granularity
=> 'ALL',
cascade
=> true,
options
=> 'GATHER'
);
END;
/
GE Internal
TABLE_NAME
NUM_ROWS
BLOCKS AVG_ROW_LEN
------------------------------ ---------- ---------- ----------EMP
1500
28
92
Monitor all of the tables within Scott's schema. (Oracle 9i and higher)
BEGIN
DBMS_STATS.alter_schema_tab_monitoring('scott', true);
END;
/
PL/SQL procedure successfully completed.
Monitor all of the tables within the database. (Oracle 9i and higher)
Note: Although the option to collect statistics for SYS tables is available via
ALTER_DATABASE_TAB_MONITORING, Oracle continues to recommend against this practice
until the next major release after 9i Release 2. Also note that the
ALTER_DATABASE_TAB_MONITORING procedure in the DBMS_STATS package only
monitors tables; there is an ALTER INDEX...MONITORING statement which can be used to
monitor indexes. Thanks to Nabil Nawaz for providing this and pointing out an error I made in
the previous version of this article.
BEGIN
DBMS_STATS.alter_database_tab_monitoring (
monitoring => true,
sysobjs
=> false);
-- Don't set to true, see note above.
END;
/
PL/SQL procedure successfully completed.
GE Internal
DEPT
EMP
NO
YES
exec dbms_stats.flush_database_monitoring_info;
PL/SQL procedure successfully completed.
GE Internal
method_opt
degree
granularity
cascade
options
END;
/
=>
=>
=>
=>
=>
DBMS_UTILITY.ANALYZE_SCHEMA procedure
DBMS_UTILITY.ANALYZE_DATABASE procedure
DBMS_DDL.ANALYZE_OBJECT procedure
User generated statistics are only done through the use of the DBMS_STATS.SET_xx_STATS
procedures
GE Internal
GE Internal