You are on page 1of 49

CSD Univ.

of Crete

Fall 2005

QUERY OPTIMIZATION AND TUNING IN ORACLE

CSD Univ. of Crete

Fall 2005

Query Tuning Hints


l Avoid redundant DISTINCT l Change nested queries to join l Avoid unnecessary temp tables l Avoid complicated correlation subqueries l Join on clustering and integer attributes l Avoid HAVING when WHERE is enough l Avoid views with unnecessary joins l Maintain frequently used aggregates l Avoid external loops l Avoid cursors l Retrieve needed columns only l Use direct path for bulk loading
2

CSD Univ. of Crete

Fall 2005

Avoid Redundant DISTINCT

SELECT DISTINCT ssnum FROM Employee WHERE dept = information systems


l DISTINCT usually entails a sort operation

down query optimization because one more interesting order to consider l Remove if you know the result has no duplicates (or duplicates are acceptable) or if answer contains a key

u Slow

CSD Univ. of Crete

Fall 2005

Avoid HAVING when WHERE is enough

SELECT MIN (E.age) FROM Employee E GROUP BY E.dno HAVING E.dno=102

SELECT MIN (E.age) FROM Employee E WHERE E.dno=102

l May first perform grouping for all departments!

Consider DBMS use of index when writing arithmetic expressions: E.age=2*D.age will benefit from index on E.age, but might not benefit from index on D.age!

CSD Univ. of Crete

Fall 2005

Avoid Using Intermediate Relations


SELECT * INTO Temp FROM Emp E, Dept D WHERE E.dno=D.dno AND D.mgrname=Joe SELECT E.dno, AVG(E.sal) FROM Emp E, Dept D WHERE E.dno=D.dno AND D.mgrname=Joe GROUP BY E.dno

and
SELECT T.dno, AVG(T.sal) FROM Temp T GROUP BY T.dno

vs.

Creating Temp table causes update to catalog Cannot use any index on original table Does not materialize the intermediate relation Temp
5

CSD Univ. of Crete

Fall 2005

Optimizing Set Difference Queries


Suppose you have to select all of the employees that are not account representatives: Table1:
s_emp soc_number last_name first_name salary

Table2:
soc_number last_name first_name region

This query is slower: SELECT soc_number FROM s_emp MINUS SELECT soc_number FROM s_account_rep; because the minus has to select distinct values from both tables
6

CSD Univ. of Crete

Fall 2005

Optimizing Set Difference Queries


l

This query is a little faster: SELECT soc_number FROM s_emp WHERE soc_number NOT IN (SELECT soc_number FROM s_account_rep);

Faster, but still not as fast because we are not joining and are not using indexes, so the following query is faster
l

SELECT /*+ index(t1) */ soc_number FROM s_emp t1 WHERE NOT EXISTS (SELECT /*+ index(t1) index(t2) */ * FROM s_account_rep t2 WHERE T1.soc_number = t2.soc_number);
CSD Univ. of Crete

Fall 2005

Change Nested Queries to Join


SELECT ssnum FROM Employee WHERE dept IN (SELECT dept FROM Techdept)
l Might not use index on Employee.dept

SELECT ssnum FROM Employee, Techdept WHERE Employee.dept = Techdept.dept


l Need DISTINCT if an employee might belong to multiple departments
8

CSD Univ. of Crete

Fall 2005

Avoid Complicated Correlation Subqueries


SELECT ssnum FROM Employee e1 WHERE salary = (SELECT MAX(salary) FROM Employee e2 WHERE e2.dept = e1.dept
l Search all of e2 for each e1 record!

SELECT MAX(salary) as bigsalary, dept INTO Temp FROM Employee GROUP BY dept SELECT ssnum FROM Employee, Temp WHERE salary = bigsalary AND Employee.dept = Temp.dept
CSD Univ. of Crete

Fall 2005

Avoid Complicated Correlation Subqueries


l SQL Server 2000 does a good
80 70 Th roughput improvement percent 60 50 40 30 20 10 0 -10 co rrelated su b q u ery S QL S erver 2000 Oracle 8i D B 2 V 7.1

> 1000

> 10000

job at handling the correlated subqueries (a hash join is used as opposed to a nested loop between query blocks) u The techniques implemented in SQL Server 2000 are described in Orthogonal Optimization of Subqueries and Aggregates by C.Galindo-Legaria and M.Joshi, SIGMOD 2001

10

CSD Univ. of Crete

Fall 2005

Join on Clustering and Integer Attributes


SELECT Employee.ssnum FROM Employee, Student WHERE Employee.name = Student.name
l Employee is clustered on ssnum l ssnum is an integer

SELECT Employee.ssnum FROM Employee, Student WHERE Employee.ssnum = Student.ssnum


11

CSD Univ. of Crete

Fall 2005

Avoid Views with Unnecessary Joins


CREATE VIEW Techlocation AS SELECT ssnum, Techdept.dept, location FROM Employee, Techdept WHERE Employee.dept = Techdept.dept SELECT dept FROM Techlocation WHERE ssnum = 4444
l Join with Techdept unnecessarily

SELECT dept FROM Employee WHERE ssnum = 4444


12

CSD Univ. of Crete

Fall 2005

Aggregate Maintenance
l Materialize an aggregate if needed frequently l Use trigger to update

create trigger updateVendorOutstanding on orders for insert as update vendorOutstanding set amount = (select vendorOutstanding.amount+sum( inserted.quantity*item.price) from inserted,item where inserted.itemnum = item.itemnum ) where vendor = (select vendor from inserted) ;
13

CSD Univ. of Crete

Fall 2005

Avoid External Loops


l No loop:

sqlStmt = select * from lineitem where l_partkey <= 200; odbc->prepareStmt(sqlStmt); odbc->execPrepared(sqlStmt);
l Loop:

sqlStmt=select * from lineitem where l_partkey = ?; odbc->prepareStmt(sqlStmt); for (int i=1; i<200; i++) { odbc->bindParameter(1, SQL_INTEGER, i); odbc->execPrepared(sqlStmt); }

14

CSD Univ. of Crete

Fall 2005

Avoid External Loops

600 throughput (records/sec) 500 400 300 200 100 0 loop no loop

l SQL Server 2000 on Windows

2000 l Crossing the application interface has a significant impact on performance

Let the DBMS optimize set operations

15

CSD Univ. of Crete

Fall 2005

Avoid Cursors
l No cursor

select * from employees;


l Cursor

l SQL Server 2000 on Windows 2000 l Response time is a few seconds

DECLARE d_cursor CURSOR FOR select * from employees; OPEN d_cursor while(@@FETCH_STATUS=0) BEGIN FETCH NEXT from d_cursor END CLOSE d_cursor go

with a SQL query and more than an hour iterating over a cursor
5000 Throughput (records/sec) 4000 3000 2000 1000 0 cursor SQ L
16

CSD Univ. of Crete

Fall 2005

Retrieve Needed Columns Only


l All

Select * from lineitem;


l Covered subset

l Avoid transferring unnecessary data l May enable use of a covering index

Throughput (queries/msec)

Select l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate from lineitem;

1.75 1.5 1.25 1 0.75 0.5 0.25 0 no inde x inde x


17

all cov e re d subse t

CSD Univ. of Crete

Fall 2005

Use Direct Path for Bulk Loading


sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl load data infile "lineitem.tbl" into table LINEITEM append fields terminated by '|' ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYYMM-DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT )

18

CSD Univ. of Crete

Fall 2005

Use Direct Path for Bulk Loading

50000 Throughput (rec/sec) 40000 30000 20000 10000 0 conv entional direct path 65 insert

l Direct path loading bypasses

the query engine and the storage manager l It is orders of magnitude faster than for conventional bulk load (commit every 100 records) and inserts (commit for each record)

19

CSD Univ. of Crete

Fall 2005

ORACLE Query Optimization Approaches


l Oracle supports two approaches for query optimization: rule-based

and cost-based, which was introduced in Oracle 7 in order to improve query optimization

l Rule-based: The optimizer ignores statistics l Cost-based: Three different goals


u All_Rows:

The optimizer optimizes with a goal of best throughput (minimum resource use to complete the entire statement u First_Rows_n: The optimizer optimizes with a goal of best response time to return the first n number of rows; n can equal 1, 10, 100 or 1000 u First_Rows: The optimizer uses a mix of cost and heuristics to find a best plan for fast delivery of the first few rows
20

CSD Univ. of Crete

Fall 2005

ORACLE Query Optimization Approaches


l Note: using heuristics sometimes leads the CBO to generate a plan with

a cost that is significantly larger than the cost of a plan without applying the heuristic
l For a specific statement the goal to be used by the optimizer can be

stated using a hint


l To specify the optimizers goal for an entire session, use the following

statement: alter session set optimizer_mode = <MODE_VALUE>, where MODE_VALUE = {rule, all_rows, first_rows, first_rows_n, choose}
21

CSD Univ. of Crete

Fall 2005

ORACLE Query Optimization Approaches


l The choose mode states that:
u The

optimizer chooses between a cost-based approach and a rule-based approach, depending on whether statistics are available. This is the default value the data dictionary contains statistics for at least one of the accessed tables, then the optimizer uses a cost-based approach and optimizes with a goal of best throughput the data dictionary contains only some statistics, then the costbased approach is still used, but the optimizer must guess the statistics for the subjects without any statistics. This can result in suboptimal execution plans
22

u If

u If

CSD Univ. of Crete

Fall 2005

Rule-Based Approach
l When ignoring statistics & heuristics, there should be a way to choose

between possible access paths suggested by different execution plans


l Thus, 15 rules were ranked in order of efficiency. An access path for a

table is chosen if the statement contains a predicate or other construct that makes that access path available
l Score assigned to each execution strategy (plan) using these rankings

and strategy with best (lowest) score selected


l When two strategies produce the same score, tie-break resolved by

making decision based on order in which tables occur in the SQL statement
23

CSD Univ. of Crete

Fall 2005

Understanding the RBO

24

CSD Univ. of Crete

Fall 2005

RBO: An Example
propertyNo, rooms and city. Consider the query: SELECT propertyNo FROM PropertyForRent WHERE rooms > 7 AND city = Sydney u Single-column access path using index on city from WHERE condition (city = Sydney): rank 9 u Unbounded range scan using index on rooms from WHERE condition (rooms > 7): rank 11. u Full table scan: rank 15 u Although there is an index on propertyNo, the column does not appear in the WHERE clause and so is not considered by the optimizer l Based on these paths, rule-based optimizer will choose to use the index on the city column 25
CSD Univ. of Crete Fall 2005

l Suppose there is a table PropertyForRent with indexed attributes:

Cost-Based Approach
l Cost-based optimizer depends on statistics for all tables, clusters, and

indexes accessed by query u Users responsibility to generate statistics and keep them up-to-date l Two ways for generating and managing statistics: u By using package DBMS_STATS, for example: EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS(schema_name); Schema_name: name of user that owns tables u By issuing the ANALYZE statement, for example: ANALYZE TABLE <table_name> COMPUTE/ESTIMATE STATISTICS; ANALYZE TABLE <table_name> COMPUTE/ESTIMATE STATISTICS FOR TABLE; ANALYZE TABLE <table_name> COMPUTE/ESTIMATE STATISTICS FOR ALL INDEXES;

26

CSD Univ. of Crete

Fall 2005

Understanding the CBO


l Functionality
u Parse

the statement u Generate a list of all potential execution plans u Calculate (estimate) the cost of each execution plan u Select the plan with the lowest cost
l Parameters
u Primary

Key Unique Index u Non-Unique Index u Range evaluation (with bind variables) u Histograms u System Resource Usage u Current Stats
27

CSD Univ. of Crete

Fall 2005

Query Tuning -- What to do?


l Problematic SQL statements usually have:

number of buffer gets u Excessive number of physical reads l So, if we consume less resources, we save time u Reduce buffer gets (more efficient access paths) Avoid (most) full table scans Check selectivity of index access paths Stay away from Nested Loop joins on large row sources u Avoid physical I/O Avoid (most) full table scans Try to avoid sorts that write to disk, such as order by, group by, merge joins (set adequate sort_area_size) Try to avoid hash joins writing to disk (hash_area_size)

u Excessive

28

CSD Univ. of Crete

Fall 2005

Access Paths
l Next, some of the triples operation-option-description (corresponding to

access paths) that can be found in execution plans are being described u Not all of them are available with the rule-based optimizer
l For more details, check Table 9-4 in chapter 9 of Database Performance

Tuning Guide and Reference

29

CSD Univ. of Crete

Fall 2005

B*-Tree Indexes
l Excellent performance for highly selective columns l l l l

effective for low selectivity columns Unique scan is most efficient, equality predicate on unique index Range scan can be quite efficient, but be careful of the size of the range specified Excellent for FIRST_ROWS access, particularly with queries returning a small number of results Index access paths u INDEX UNIQUE SCAN u INDEX RANGE SCAN u INDEX FULL SCAN u INDEX FAST FULL SCAN u INDEX SKIP SCAN (9i only)
30

u Not

CSD Univ. of Crete

Fall 2005

B*-Tree Index Access Paths


l INDEX UNIQUE SCAN
u Equality

predicate on unique or primary key column(s) u Generally considered most efficient access path u usually no more than 3-4 buffer gets u If table is small, FULL TABLE SCAN could be cheaper
l INDEX RANGE SCAN
u Equality

predicate on non-unique index, incompletely specified unique index, or range predicate on unique index u Be careful of the size of the range Large ranges could amount to huge # of buffer gets If so, consider a FAST FULL INDEX SCAN or FULL TABLE SCAN
31

CSD Univ. of Crete

Fall 2005

B*-Tree Index Access Paths


l INDEX FULL SCAN
u Will

scan entire index by walking tree, in index order u Provides ordered output, can be used to avoid sorts for ORDER BY clauses that specify index column order u Slower than INDEX FAST FULL SCAN, if there is no ORDER BY requirement
l INDEX FAST FULL SCAN

read index, in disk block order, and discard root and branch blocks u Will do db file scattered read, reading db_file_multiblock_read_count blocks at a time u Equivalent to FULL TABLE SCAN for an index u Fastest way to read entire contents of an index
32

u Will

CSD Univ. of Crete

Fall 2005

B*Tree Index Access Path


l INDEX SKIP SCAN (Oracle 9i only)
u Allows

some benefits of multi-column index, even without specifying the leading edge u Oracle will skip scan, starting with root block, skipping through B*-tree structure, masking sections of tree that cannot have applicable data u Could be costly, depending on size of index, distribution of data, and bind variable values

33

CSD Univ. of Crete

Fall 2005

Bitmap Indexes
l Are most often implemented in a Data Warehouse environment l Are useful for columns which:

relatively low cardinality, where B*-Tree indexes will fail to provide any benefit u are often specified along with other columns in WHERE clauses of SQL statements, optimizer will BITMAP AND the results of many single column bitmap indexes together l Are most efficient when doing COUNT(*) operations, where optimizer can utilize the BITMAP CONVERSION COUNT access path l Index Access Paths u BITMAP INDEX SINGLE VALUE u BITMAP INDEX RANGE SCAN u BITMAP INDEX FULL SCAN u BITMAP AND u BITMAP OR u BITMAP NOT u BITMAP CONVERSION COUNT 34 u BITMAP CONVERSION TO ROWIDs

u have

CSD Univ. of Crete

Fall 2005

Bitmap Index Access Paths


l BITMAP INDEX SINGLE VALUE
u Used

to satisfy equality predicate

l BITMAP INDEX RANGE SCAN


u Used

to satisfy range operations such as BETWEEN u Unlike range scans on B*-Tree, is very efficient even for very large ranges

l BITMAP INDEX FULL SCAN


u Used

to satisfy NOT predicate u Scan of entire index to identify rows NOT matching

l BITMAP INDEX AND, OR, NOT


u Used

for bitwise combinations of multiple bitmap indexes


35

CSD Univ. of Crete

Fall 2005

Bitmap Conversions
l BITMAP CONVERSION COUNT
u Used

to evaluate COUNT(*) operation for queries whose where clause predicates only specify columns having bitmap indexes u Very fast, very efficient
l BITMAP CONVERSION TO ROWIDS
u Used

in cases where row source produced by bitmap index operations needs to be joined to other row sources, i.e., join to another table, group by operation, to satisfy a TABLE ACCESS BY ROWID operation u More resource intensive than BITMAP CONVERSION COUNT u Can be quite expensive if number of ROWIDs is large
36

CSD Univ. of Crete

Fall 2005

Other Miscellaneous Access Paths


l UNION, UNION-ALL, MINUS, INTERSECTION
u Directly

correspond to the SQL set operators u UNION-ALL is cheapest, since no SORT(UNIQUE) is required
l TABLE FULL SCAN
u Reads

all blocks allocated to table u Can be most efficient access path for small tables u Can cause significant physical I/O, particularly on larger tables Consider ALTER TABLE table_name CACHE or putting table into KEEP buffer pool

37

CSD Univ. of Crete

Fall 2005

Other Miscellaneous Access Paths


l TABLE ACCESS BY ROWID
u Generally

used in conjunction with an index access path, where rowid has been identified, but Oracle needs access to a column not in the index u Consider whether adding a column to an existing index will provide substantial benefit u Cost is directly proportional to number of rowid lookups that are required
l TABLE (HASH)
u More

efficient than index access u Requires creation of hash cluster, more administrative overhead
38

CSD Univ. of Crete

Fall 2005

Join Methods
l Nested Loops

geared towards FIRST_ROWS access u Ideal for B*-Tree index driven access, small row sources u When this is the case, always best for first row response time u Can get very costly very quickly if no index path exists or index path is inefficient l Sort Merge u Generally geared towards ALL_ROWS access u Can be useful for joining small to medium size row sources, particularly if viable index path is not available or if cartesian join is desired u Be wary of sort_area_size, if its too small, sorts will write to disk, performance will plummet l Hash u Most controversial (and misunderstood) join method u Can be very powerful, when applied correctly u Useful for joining small to medium sized to a large row source u Can be sensitive to instance parameters such as hash_area_size, 39 hash_multiblock_io_count, db_block_size
CSD Univ. of Crete Fall 2005

u Generally

Optimize Joins
l Pick the best join method

loops joins are best for indexed joins of subsets u Hash joins are usually the best choice for big joins u Hash Join can only be used with equality u Merge joins work on inequality u If all index columns are in the where clause a merge join will be faster l Pick the best join order u Pick the best driving table u Eliminate rows as early as possible in the join order l Optimize special joins when appropriate u STAR joins for data-warehousing applications u STAR_TRANSFORMATION if you have bitmap indexes u ANTI-JOIN methods for NOT IN sub-queries u SEMI-JOIN methods for EXISTS sub-queries

u Nested

40

CSD Univ. of Crete

Fall 2005

Using ORACLE Optimization Modes


l When will the RBO be used?

= RULE u =CHOOSE & statistics are not present for all tables in SQL statement u Alter session has been issued u RULE hint is present
l When will the CBO be used?

u OPTIMIZER_MODE

= CHOOSE u =CHOOSE & statistics are not present for any tables in SQL statement u Alter session set optimizer_mode = (choose, first_rows or all_rows) u CHOOSE, ALL_ROWS or FIRST_ROWS hint is present
41

u OPTIMIZER_MODE

CSD Univ. of Crete

Fall 2005

Tuning Tools
l A significant portion of SQL that performs poorly in production was l l

originally crafted against empty or nearly empty tables Make sure you establish a reasonable sub-set of production data that is used during development and tuning of SQL In order to monitor execution plans and tune queries, Oracle 9i (and higher) provides the following three tools: u Explain Plan command u TkProf trace file formatter u The SQLTrace (or AutoTrace) facility These tools, mainly, allow the user to the verify which access paths are considered by an execution plan u Some of them provide, also, information about the number of buffers used, physical reads from buffers, rows returned from each step, etc Effective SQL tuning requires either familiarly with these tools or the use 42 of commercial alternatives such as SQLAB

CSD Univ. of Crete

Fall 2005

Explain Plan
l The EXPLAIN PLAN reveals the execution plan for an SQL statement

execution plan reveals the exact sequence of steps that the Oracle optimizer has chosen to employ to process the SQL l The execution plan is stored in an Oracle table called the PLAN_TABLE u Suitably formatted queries can be used to extract the execution plan from the PLAN_TABLE u Create PLAN_TABLE command: @$ORACLE_HOME/rdbms/admin/utlxplan.sql u Issue explain plan command: Explain plan set statement_id = MJB for select * from dual; u Issue query to retrieve execution plan: @$ORACLE_HOME/rdbms/admin/utlxpls.sql l The more heavily indented an access path is, the earlier it is executed u If two steps are indented at the same level, the uppermost statement is executed first u Some access paths are joined such as an index access that is 43 followed by a table lookup
CSD Univ. of Crete Fall 2005

u The

Plan_Table
create table PLAN_TABLE ( statement_id varchar2(30), timestamp date, remarks varchar2(80), operation varchar2(30), options varchar2(30), object_node varchar2(128), object_owner varchar2(30), object_name varchar2(30), object_instance numeric, object_type varchar2(30), optimizer varchar2(255), ); search_columns id parent_id position cost cardinality bytes other_tag partition_start partition_stop partition_id other distribution number, numeric, numeric, numeric, numeric, numeric, numeric, varchar2(255), varchar2(255), varchar2(255), numeric, long, varchar2(30)

44

CSD Univ. of Crete

Fall 2005

Explain Plan
l Sample Query: Explain plan set statement_id = MJB for

select doc_title from documents doc, page_collections pc where pc.pc_issue_date = 01-JAN-2002 and pc.pc_id = doc.pc_id;
l Sample Explain plan output
---------------------------------------------------------------------------------------------| Operation | Object_Name | Rows | Bytes| Cardinality | Pstart| Pstop | ---------------------------------------------------------------------------------------------| SELECT STATEMENT | | 61K| 3M| 328 | | | | NESTED LOOPS | | 61K| 3M| 328 | | | | TABLE ACCESS BY INDEX RO|PAGE_COLL | 834 | 9K| 78 | | | | INDEX RANGE SCAN |PC_PC2_UK | 834 | | 6 | | | | INDEX RANGE SCAN |DOC_DOC2_ | 86M| 4G| 3 | | | ---------------------------------------------------------------------------------------------45

CSD Univ. of Crete

Fall 2005

Viewing the Execution Plan of a Query in Oracle

Plan_Table an SQL Table Statement_id plan identifier Id a number assigned to each step Parent_id id of next step which operates on output of this step Operation eg internal operation select, insert etc Options name of internal operation

46

CSD Univ. of Crete

Fall 2005

A More Complex EXPLAIN PLAN

47

CSD Univ. of Crete

Fall 2005

TkProf
l More details provided than Autotrace or Explain Plan l For more useful information:

alter session set timed_statistics = true;


l To enable tracing:

alter session set sql_trace = true;


l Trace file written to user_dump_dest l Use:

tkprof <trace_file> <output_file>


48

CSD Univ. of Crete

Fall 2005

TkProf Sample Output


******************************************************************************** count = number of times OCI procedure was executed cpu = cpu time in seconds executing elapsed = elapsed time in seconds executing disk = number of physical reads of buffers from disk query = number of buffers gotten for consistent read current = number of buffers gotten in current mode (usually for update) rows = number of rows processed by the fetch or execute call ******************************************************************************** <some text deleted> select doc_title from documents doc, page_collections pc where pc.pc_issue_date = '01-JAN-2002' and pc.pc_id = doc.pc_id call count ------- -----Parse 1 Execute 1 Fetch 1415 ------- -----total 1417 cpu elapsed disk query current -------- ---------- ---------- ---------- ---------0.00 0.01 0 0 0 0.00 0.00 0 0 0 0.07 0.09 0 1853 0 -------- ---------- ---------- ---------- ---------0.07 0.10 0 1853 0 rows ---------0 0 21206 ---------21206
49

CSD Univ. of Crete

Fall 2005

TkProf Sample Output

Rows ------21206 31 31 21206

Row Source Operation --------------------------------------------------NESTED LOOPS TABLE ACCESS BY INDEX ROWID PAGE_COLLECTIONS INDEX RANGE SCAN (object id 22993) INDEX RANGE SCAN (object id 22873)

Rows ------0 21206 31 31 21206

Execution Plan --------------------------------------------------SELECT STATEMENT GOAL: CHOOSE NESTED LOOPS TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF 'PAGE_COLLECTIONS' INDEX GOAL: ANALYZED (RANGE SCAN) OF 'PC_PC2_UK' (UNIQUE) INDEX GOAL: ANALYZED (RANGE SCAN) OF 'DOC_DOC2_UK' (UNIQUE)

50

CSD Univ. of Crete

Fall 2005

SQL_TRACE and tkprof


l ALTER SESSION SET SQL_TRACE TRUE causes a trace of SQL

execution to be generated
l The TKPROF utility formats the resulting output l Tkprof output contains breakdown of execution statistics execution plan

and rows returned for each step u These stats are not available from any other source
l Tkprof is the most powerful tool, but requires a significant learning

curve

51

CSD Univ. of Crete

Fall 2005

Tkprof output

52

CSD Univ. of Crete

Fall 2005

Using SQLab
l Because EXPLAIN PLAN and tkprof are unwieldy and hard to

interpret, third party tools that automate the process and provide expert advice improve SQL tuning efficiency

l The Quest SQLab product:


u Identifies

SQL your database that could benefit from tuning u Provides a sophisticated tuning environment to examine, compare and evaluate execution plans u Incorporates an expert system to advise on indexing and SQL statement changes

l Features
u Display

execution plan in a variety of intuitive ways u Provide easy access to statistics and other useful data u Model changes to SQL and immediately see the results

53

CSD Univ. of Crete

Fall 2005

SQLab SQL tuning lab

54

CSD Univ. of Crete

Fall 2005

SQLab Expert Advice


l SQLab provides specific advice on how to tune an SQL statement

55

CSD Univ. of Crete

Fall 2005

SQLab SQL trace integration


l SQLab can also retrieve the execution statistics that are otherwise

only available through tkprof

56

CSD Univ. of Crete

Fall 2005

Choosing a Driving Table


l The driving table is the table that is first used by Oracle in processing the

query u Choosing the correct driving table is critical


l Driving table should be the table that returns the smallest number of

rows and do the smallest number of buffer gets u Driving table should not necessarily be the table with the smallest number of rows
l In the case of cost-based optimization, the driving table is first after the

FROM clause. Thus, place smallest table first after FROM, and list tables from smallest to largest u The table order still makes a difference in execution time, even when using the cost-based optimizer

57

CSD Univ. of Crete

Fall 2005

Choosing a Driving Table


l Example:

select doc_title from documents doc, page_collections pc where pc.pc_issue_date = '01-JAN-2002 and pc.pc_id = doc.pc_id
l Which table should be driving?
u DOCUMENTS

has 110+ million rows No filtering predicates in where clause, all rows will be in row source u PAGE_COLLECTIONS has 1.4+ million rows PC_ISSUE_DATE predicate will filter down to 30 rows
58

CSD Univ. of Crete

Fall 2005

Using Hints
l Hints are used to convey your tuning suggestions to the optimizer

or malformed hints are quietly ignored l Commonly used hints include: u ORDERED u INDEX(table_alias index_name) u FULL(table_alias) u INDEX_FFS(table_alias index_name) u INDEX_COMBINE(table_alias index_name1 .. index_name_n) u And_EQUAL(table_alias index_name1 index_name2 .. Index_name5) u USE_NL(table_alias) u USE_MERGE(table_alias) u USE_HASH(table_alias) l Hints should be specified as: /*+ hint */ u Hints should immediately follow the SELECT keyword u The space following the +can be significant inside of PL/SQL, due to bug in Oracle parser (see bug #697121) l Driving table will never have a join method hint, since there is no row 59 source to join it to
CSD Univ. of Crete Fall 2005

u Misspelled

Simple Example of Tuning with Hints


l Initial SQL

select doc_id, doc_title, pc_issue_date from documents doc, page_collections pc where doc.pc_id = pc.pc_id and doc.doc_id = 9572422;
l Initial Execution Plan
Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2575 Card=50 Bytes=3600) 1 0 MERGE JOIN (Cost=2575 Card=50 Bytes=3600) 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'PAGE_COLLECTIONS' (Cost=2571 Card=1442348) 3 2 INDEX (FULL SCAN) OF 'PC_PK' (UNIQUE) (Cost=3084 Card=1442348) 4 1 SORT (JOIN) (Cost=3 Card=1 Bytes=60) 5 4 TABLE ACCESS (BY INDEX ROWID) OF 'DOCUMENTS' (Cost=1 Card=1 Bytes=60) 6 5 INDEX (UNIQUE SCAN) OF 'DOC_PK' (UNIQUE) (Cost=2 Card=1)

l Initial number of buffer gets: 444


60

CSD Univ. of Crete

Fall 2005

Simple Example of Tuning with Hints


l First Tuning attempt

select /*+ FULL(pc)*/ doc_id, doc_title, pc_issue_date from documents doc, page_collections pc where doc.pc_id = pc.pc_id and doc.doc_id = 9572422;
l Tuned Execution Plan
Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3281 Card=50 Bytes=3600) 1 0 HASH JOIN (Cost=3281 Card=50 Bytes=3600) 2 1 TABLE ACCESS (FULL) OF 'PAGE_COLLECTIONS' (Cost=1675 Card=1442348 Bytes=17308176) 3 1 TABLE ACCESS (BY INDEX ROWID) OF 'DOCUMENTS' (Cost=1 Card=1 Bytes=60) 4 3 INDEX (UNIQUE SCAN) OF 'DOC_PK' (UNIQUE) (Cost=2 Card=1)

l Number of buffer gets: 364


61

CSD Univ. of Crete

Fall 2005

Simple Example of Tuning with Hints


l Second Tuning attempt

select /*+ ORDERED USE_NL(pc)*/ doc_id, doc_title, pc_issue_date from documents doc, page_collections pc where doc.pc_id = pc.pc_id and doc.doc_id = 9572422;
l Second Tuned Execution Plan
Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=50 Bytes=3600) 1 0 NESTED LOOPS (Cost=2 Card=50 Bytes=3600) 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'DOCUMENTS' (Cost=1 Card=1 Bytes=60) 3 2 INDEX (UNIQUE SCAN) OF 'DOC_PK' (UNIQUE) (Cost=2 Card=2) 4 1 TABLE ACCESS (BY INDEX ROWID) OF 'PAGE_COLLECTIONS' (Cost=1 Card=1442348) 5 4 INDEX (UNIQUE SCAN) OF 'PC_PK' (UNIQUE) (Cost=1 Card=1442348)

l Number of buffer gets: 7


62

CSD Univ. of Crete

Fall 2005

Considerations and Cautions


l Fundamental changes to the query structure allow the optimizer different l

options Using select in the select list allowed for a GROUP BY result without a GROUP BY operation, thus avoiding costly BITMAP CONVERSION TO ROWIDS Other places where re-writing query can have benefits: u Rewrite sub-select as join, allows optimizer more options u consider EXISTS/NOT EXISTS and IN/NOT IN operations Adding hints to a large number of your SQL statements? u Take a step back, consider whether you need to tune your CBO params u Hand tuning a majority of SQL in an application will complicate code, and add a lot of time to development effort u As new access paths are introduced in Oracle, statements that use hints will not utilize them, and continue using the old access paths When individual statement tuning is necessary, a solid understanding of 63 access paths, join order and join methods is the key to success
CSD Univ. of Crete Fall 2005

Considerations and Cautions


l Use hints sparingly

you have the opportunity, tune via CBO parameters first u Dont over-specify hints u SQL Tuning is as important as ever: Need to understand the access paths, join orders, and join methods, even if only to evaluate what the CBO is doing CBO gets better with each release, but it will never know as much about the application and data model as a well-trained developer

u If

64

CSD Univ. of Crete

Fall 2005

Myths
l SQL tuned for RBO will run well in the CBO l SQL developers do not need to be retrained to write SQL for the CBO l 8i and 9i do not support the RBO l You cant run RULE and COST together l Oracle says the CBO is unreliable and you should use RULE l Hints cant be used in RULE
65

CSD Univ. of Crete

Fall 2005

Top 9 Oracle SQL Tuning Tips


1. Design and develop with performance in mind 2. Index wisely 3. Reduce parsing 4. Take advantage of Cost Based Optimizer 5. Avoid accidental table scans 6. Optimize necessary table scans 7. Optimize joins 8. Use array processing 9. Consider PL/SQL for tricky SQL

66

CSD Univ. of Crete

Fall 2005

Design and Develop with Performance in Mind


l Explicitly identify performance targets l Focus on critical transactions
u Test

the SQL for these transactions against simulations of production data

l Measure performance as early as possible l Consider prototyping critical portions of the applications l Consider de-normalization and other performance by design features

early on
67

CSD Univ. of Crete

Fall 2005

De-Normalization
l If normalizing your OLTP database forces you to create queries with

many multiple joins (4 or more)


l De-normalization is the process of selectively taking normalized

tables and re-combining the data in them in order to reduce the number of joins needed them to produce the necessary query results
l Sometimes the addition of a single column of redundant data to a

table from another table can reduce a 4-way join into a 2-way join, significantly boosting performance by reducing the time it takes to perform the join

68

CSD Univ. of Crete

Fall 2005

De-Normalization
l Example: We have the following schema:

Similarities:
user1 user2 similarity

Averages:
user average

l Similarities table contains the similarity measure for all the possible pairs

of users and Averages table the average ratings of all users in Database l In order to update all similarity measures we need the average value for each user l Suppose we have over 1.000.000 users stored in our Database (about 500 billions of user-pairs!) l To avoid joining we should consider of the following schema: Similarities:
user1 user2 similarity average1 average2
69

CSD Univ. of Crete

Fall 2005

De-Normalization
l While de-normalization can boost join performance, it can also have

negative effects. For example, by adding redundant data to tables, you risk the following problems: u More data means reading more data pages than otherwise needed, hurting performance u Redundant data can lead to data anomalies and bad data u In many cases, extra code will have to be written to keep redundant data in separate tables in synch, which adds to database overhead u As you consider whether to de-normalize a database to speed joins, be sure you first consider if you have the proper indexes on the tables to be joined It is possible that your join performance problem is more of a problem with a lack of appropriate indexes than it is of joining too many tables
70

CSD Univ. of Crete

Fall 2005

Index Wisely
l Index to support selective WHERE clauses and join conditions l Use concatenated indexes where appropriate l Consider over-indexing to avoid table lookups l Consider advanced indexing options

Clusters When a table is queried frequently with equality queries You can avoid using the ORDER BY clause, as well as sort operations More administrative overhead u Bit mapped indexes Can use large amounts of memory Use sparingly u Index only tables
71

u Hash

CSD Univ. of Crete

Fall 2005

Index Wisely
l Do not index columns that are modified frequently

statements that modify indexed columns and INSERT and DELETE statements that modify indexed tables take longer than if there were no index must modify data in indexes as well as data in tables l Do not index keys that appear only with functions or operators u A WHERE clause that uses a function (other than MIN or MAX) or an operator with an indexed key does not make available the access path that uses the index (except with function-based indexes) l When choosing to index a key, consider whether the performance gain for queries is worth the performance loss for INSERTs, UPDATEs, and DELETEs and the use of the space required to store the index u You might want to experiment by comparing the processing times of the SQL statements with and without indexes 72 You can measure processing time with the SQL trace facility

u UPDATE

CSD Univ. of Crete

Fall 2005

Reduce Parsing
l Use Bind variables
u Bind

variables are key to application scalability u If necessary set cursor CURSOR_SHARING to FORCE
l Reuse cursors in your application code
u How

to do this depends on your development languages

l Use a cursor cache


u Setting

SESSION_CACHED_CURSORS can help applications that are not re-using cursors

73

CSD Univ. of Crete

Fall 2005

Bind Values
l Use bind variables rather than literals in SQL statements whenever

possible l For example, the following two statements cannot use the same shared area because they do not match character for character:
SELECT employee_id FROM employees WHERE department_id = 10; SELECT employee_id FROM employees WHERE department_id = 20; l By replacing the literals with a bind variable, only one SQL statement

would result, which could be executed twice:


SELECT employee_id FROM employees WHERE department_id = :dept_id;
74

CSD Univ. of Crete

Fall 2005

Bind Values
l In SQL*Plus you can use bind variables as follows: SQL> variable dept_id number SQL> exec :dept_id := 10 SQL> SELECT employee_id FROM employees WHERE department_id = :dept_id; l What we've done to the SELECT statement now is take the literal

value out of it, and replace it with a placeholder (our bind variable), with SQL*Plus passing the value of the bind variable to Oracle when the statement is processed.

75

CSD Univ. of Crete

Fall 2005

Cursors
Instead of:
select count(*) into tot from s_emp where emp_id = v_emp_id;

Declare a cursor for the count:


cursor cnt_emp_cur(v_emp_id number) is select count(*) emp_total from s_emp emp_id = v_emp_id; cnt_emp_rec cnt_emp%rowtype;

Or if just checking for existence


cursor cnt_emp_cur(v_emp_id number) is select emp_id from s_emp where where emp_id= v_emp_id and rownum = 1;

And then do the fetch from this cursor:


open cnt_emp(v_emp_id); fetch cnt_emp into cnt_emp_rec; close cnt_emp;

76

CSD Univ. of Crete

Fall 2005

Take Advantage of the Cost Based Optimizer


l The older rule based optimizer is inferior in almost every respect to the

modern cost based optimizer; basic RBO problems


u Incorrect u Incorrect u Incorrect

driving table index

40% 40%

driving index 10%

l Using the cost based optimizer effectively involves:


u Regular

collection of table statistics using the ANALYZE or DBMS_STATS command hints and how they can be used to influence SQL statement execution the appropriate optimizer mode;
77

u Understand

u Choose

FIRST_ROWS is best for OLTP applications ALL_ROWS suits reporting and OLAP jobs
CSD Univ. of Crete

Fall 2005

Analyze Wrong Data


l Tables were analyzed with incorrect data volumes l When does this occur?
u Table

rebuilt u Index added u Migrate schema to production u Analyze before a bulk load
l Missing Stats:
u Oracle

will estimate the stats for you u These stats are for this execution only u Stats on Indexes
78

CSD Univ. of Crete

Fall 2005

Avoid Accidental Table Scans


l Table scans that occur unintentionally are a major source of poorly

performing SQL
l Causes include:

Index u Using != , <> or NOT Use inclusive range conditions or IN lists u Looking for values that are NULL Use NOT NULL values with a default value u Using function on indexed columns

u Missing

79

CSD Univ. of Crete

Fall 2005

Factors that can Cause an Index not to be Used


1) Using a function on the left side SELECT * FROM s_emp WHERE substr(title,1,3) = Man; SELECT * FROM s_emp WHERE trunc(hire_date)=trunc(sysdate); Solution:
Use like : Since there is a function around this column the index will not be used. This includes Oracle functions to_char, to_number, ltrim, rtrim, instr, trunc, rpad , lpad.

SELECT * FROM s_emp WHERE title LIKE Man%;


Use >, < :

SELECT * FROM s_emp WHERE hire_date >= sysdate AND hire_date < sysdate + 1;

80

CSD Univ. of Crete

Fall 2005

Factors that can Cause an Index not to be Used


2) Comparing incompatible data types SELECT * FROM s_emp WHERE employee_number = 3; SELECT * FROM s_emp WHERE hire_date = 12-jan-01; Solution: SELECT * FROM s_emp WHERE employee_number = 3; SELECT * FROM s_emp WHERE hire_date = to_date(12-jan-01);
There will be an implicit to_char conversion used There will be an implicit to_date conversion used

81

CSD Univ. of Crete

Fall 2005

Factors that can Cause an Index not to be Used


3) Using null and not null SELECT * FROM s_emp WHERE title IS NOT NULL; SELECT * FROM s_emp WHERE title IS NULL; Solution: SELECT * FROM s_emp WHERE title >= ; Use an Oracle hint: SELECT /*+ index (s_emp) */ * FROM s_emp WHERE title IS NULL;
Oracle hints are always enclosed in /*+ */ and must come directly after the select clause The index hint causes indexes to be used
82

Since the column title has null values, and is compared to a null value, the index can not be used

CSD Univ. of Crete

Fall 2005

Factors that can Cause an Index not to be Used


4) Adding additional criteria in the where clause for a column name that is of a different index SELECT * FROM s_emp WHERE title= Manager AND salary = 100000; Solution:
(Use an Oracle hint) Oracle hints are always enclosed in /*+ */ and must come directly after the select clause The index hint causes indexes to be used S_EMP is the Oracle table
83

Column title and salary have separate indexes on these columns

SELECT /*+ index (s_emp) */ FROM s_emp WHERE title= Manager AND salary = 100000;

CSD Univ. of Crete

Make sure most Restrictive Indexes are being Used by using Oracle hints

Fall 2005

SELECT COUNT(*) FROM vehicle WHERE assembly_location_code = 'AP24A' AND production_date = '06-apr-01'; COUNT(*) ---------787 Elapsed: 00:00:10.00

This does not use an index

Notice it is 10 seconds

84

CSD Univ. of Crete

Make sure most Restrictive Indexes are being Used by using Oracle hints

Fall 2005

SELECT /*+ index (vehicle FKI_VEHICLE_1) */ COUNT(*) FROM vehicle WHERE assembly_location_code = 'AP24A' AND production_date = '06-apr-01'; COUNT(*) ---------787 Elapsed: 00:00:00.88

This does use an index

Notice it is less than 1 second. USE THE MOST SELECTIVE INDEX that will return the fewest records

85

CSD Univ. of Crete

Fall 2005

Some Idiosyncrasies
l Condition Order:

order of your where clause will effect the performance l OR may stop the index being used u break the query and use UNION

u The

86

CSD Univ. of Crete

Fall 2005

OR

87

CSD Univ. of Crete

Fall 2005

Optimize Necessary Table scans


l There are many occasions where a table scan is the only option, If so:
u Consider

parallel query option

l Try to reduce size of the table


u Adjust

PCTFREE and PCTUSED u Relocate infrequently used long columns


l Improve the caching of the table
u Use

the CACHE hint or table property u Implement KEEP and RECYCLE pools
l Partition the table l Consider the fast full index scan
88

CSD Univ. of Crete

Fall 2005

IN Lists
SELECT empno FROM emp WHERE deptno IN (10,20,30)
l Rewritten as:

SELECT empno WHERE deptno UNION ALL SELECT empno WHERE deptno UNION ALL SELECT empno WHERE deptno

FROM emp = 10 FROM emp = 20 FROM emp = 30


89

CSD Univ. of Crete

Fall 2005

Data Partitioning
l If you are designing a database that potentially could be very large,

holding millions or billions of rows, consider the option of horizontally partitioning your large tables u Horizontal partitioning divides what would typically be a single table into multiple tables, creating many smaller tables instead of a single, large table u The advantage of this is that is generally is much faster to query a single small table than a single large table

l For example, if you expect to add 10 million rows a year to a transaction

table, after five years it will contain 50 million rows u In most cases, you may find that most queries (although not all) queries on the table will be for data from a single year u If this is the case, if you partition the table into a separate table for each year of transactions, then you can significantly reduce the overhead of the most common of queries

90

CSD Univ. of Crete

Fall 2005

When Joining
l Make sure everything that can be joined is joined (for 3 or more tables)

Instead of: SELECT * FROM t1, t2, t3 WHERE t1.emp_id = t2.emp_id AND t2.emp_id = t3.emp_id add: SELECT * FROM t1, t2, t3 WHERE t1.emp_id = t2.emp_id AND t2.emp_id = t3.emp_id AND t1.emp_id = t3.temp_id;
l Make sure smaller table is first in the from clause
91

CSD Univ. of Crete

Fall 2005

Joining too Many Tables


l The more tables the more work for the optimizer l Best plan may not be achievable

Tables 1 2 3 4 5 6 7 8 9

Permutations 1 2 6 24 120 720 5040 40320 362880

Tables Permutations 10 3628800 11 39916800 12 479001600 13 6226020800 14 87178291200 15 1307674368000

92

CSD Univ. of Crete

Fall 2005

Use ARRAY Processing


l Retrieve or insert rows in batches, rather than one at a time l Methods of doing this are language specific l Suppose that a new user registers in our Database l We have to create an entry in Similarities table for each pair of new

user and the existing users l Instead of selecting all the existing users and make the insertion individually we should use the following statement: insert into similarities (user1, user2, similarity) new_user_id as user1, select user_id from users as user2, 0 as similarity;
93

CSD Univ. of Crete

Fall 2005

Consider PL/SQL for Tricky SQL


l With SQL you specify the data you want, not how to get it
u Sometime

you need to specifically dictate your retrieval algorithms

l For example:
u Getting

the second highest value u Correlated updates u SQL with multiple complex correlated subqueries u SQL that seems to hard to optimize unless it is broken into multiple queries linked in PL/SQL
l Using explicit instead of implicit cursors
u Implicit

cursors always take longer than explicit cursors because they are doing an extra to make sure that there is no more data

l Eliminating cursors where ever possible


94

CSD Univ. of Crete

When your SQL is Tuned, Look to your Oracle Configuration

Fall 2005

l When SQL is inefficient there is limited benefit in investing in Oracle server

or operating system tuning


l However, once SQL is tuned, the limiting factor for performance will be

Oracle and operating system configuration


l In particular, check for internal Oracle contention that typically shows up as

latch contention or unusual wait conditions (buffer busy, free buffer, etc)

95

CSD Univ. of Crete

Fall 2005

Other Parameters
l OPTIMIZER_MAX_PERMUTATIONS
u u u u

Remember the too many joins? Default is 80,000 Can lead to large Parse times Altering can lead to non optimal plan selection

l OPTIMIZER_INDEX_CACHING
u u u u

Represents # of blocks that can be found in the cache Range 0 - 99 Default is 0 implies that index access will require a physical read Should be set to 90
96

CSD Univ. of Crete

Fall 2005

Other Parameters
l OPTIMIZER_INDEX_COST_ADJ
u u u u

Represents cost of index access to full table scans Range 1 10000 Default is 100 Means index access is as costly as Full Table Scan Should be between 10 50 for OLTP and approx 50 for DSS

l DB_FILE_MULTIBLOCK_READ_COUNT
u u

Setting too high can cause Full Table Scans Can adjust for this by setting OPTIMIZER_INDEX_COST_ADJ

l DB_KEEP_CACHE_SIZE l DB_RECYCLE_CACHE_SIZE l DB_BLOCK_HASH_BUCKETS


97

CSD Univ. of Crete

Fall 2005

References
l Dennis Shasha and Phillipe Bonnet Database Tuning : Principles

Experiments and Troubleshooting Techniques, Morgan Kaufmann Publishers 2002. l Mark Levis Common tuning pitfalls of Oracles optimizers, Compuware l Duane Spencer TOP tips for ORACLE SQL tuning, Quest Software, Inc.

98

You might also like