Professional Documents
Culture Documents
Prepared by Saurabh Kumar Mishra Performance Engineering & Enhancement offerings (PE2) Infosys Technologies Limited (NASDAQ: INFY) saurabhkumar_mishra@infosys.com This paper describes a framework for Tuning Oracle Queries which leads to provide maximum improvement in execution time. It includes the practical examples which shows, how to use this framework. INTRODUCTION
Currently rdbms systems such as oltp databases with huge records getting processed per unit of time uses complex logics which leads to usage of aggregate functions, union, union-all, minus, exists and group by views etc. Some of the parameter which developers/dba commonly use while tuning any queries are generating explain plan by using plan table or by generating trace files for any session, but analyzing what is affecting the query performance before starting tuning is the most important step. This paper will provide a step wise approach for tuning the plsql or sqls. Over the time period SQL has been used extensively in rdbms systems. I will be discussing in this paper specifically about Oracle with a generalized approach. In oracle Sql execution time completely depends on components such as Query Optimizer and Query execution engines.
Query Optimizer
A typical query optimizer; the parsed query will come from parser as an input to the optimizer. The next step is generation of potential execution plans based on availability of available access paths and hints which is task of query transformer. Estimator will then use the dictionary tables like dba_tables, dba_tab_columns, dba_indexes, dba _tab_partitions etc to generate the cost for each plan. The dictionary tables play here an important role in collection of statistics like data distribution and storage characteristics of tables, indexes, partitions accessed by the sqls. Once this cost is calculated the Plan Generator will select the lowest cost query plan and will provide it to row source generator for execution. So for execution of any query the key role will be played by the statistics generation in dictionary tables/views, which can be stated as the first prerequisite for any query tuning activity.
1|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.
Framework
CollecttheTimingfortheexistingUntunedQuery.
GenerateExplainPlan
AnalyseExplainPlan
Recommendationapplication FunctionalTesting
2|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.
Descriptions
Each step in the framework plays a significant role in tuning of queries, but the first and the last steps i.e. collection of before and after results the response time with the no of record count each query is generating play a vital role. Along with this collection of load (on server & database) at which you are executing the queries and gathering the results is also important. Some tuning expert also uses cost as the base line for tuning of queries and analyzing the improvement, which can be little distractive when you are looking at the response time as the major factor in tuning. So Sql tuning can generally can be divided into: o Tuning for best response time, o Tuning for best throughput (i.e. less usage of resources in db and server). But again what is considered as best tuning practice is, the query should give the best response time using the least resource, So you need to keep in mind both cost and response time in to factor while tuning Sql queries. The next step before you actually begin with generation of explain_plan is collection and evaluation of Statistics available for Objects (tables and indexes).
3|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.
num_distinct and sample_size for tables, distinct_keys and sample_size for indexes. I will leave the analysis part to you using the best recommendations suggested above.
4|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.
Collection of these few point initially will be of great help while tuning in later stages, now in the paper I will use a complex example to explain each of these parameters. Query 1: A module query from a Batch processing system, first step where it collects all the matching values from the I_claims (Partitioned) and M_n_audit Table based on filters.
SELECT * FROM (SELECT M_ID, N_A, L_NAME, D_R, E_DATE, T_DATE FROM (SELECT MBNDCA.M_ID, MBNDCA.N_A, MBNDCA.L_NAME, MBNDCA.D_R, MBNDCA.E_DATE, MBNDCA.T_DATE, R_FLAG, RANK () OVER (PARTITION BY MBNDCA.M_ID, MBNDCA.N_A ORDER BY MBNDCA.VERSION_N DESC, MBNDCA.M_NA_VER_ID DESC) R FROM M_N_AUDIT MBNDCA WHERE UPPER (M_ID) = UPPER (:V_M_ID) AND ( 20071001 BETWEEN MBNDCA.E_DATE AND MBNDCA.T_DATE OR MBNDCA.E_DATE BETWEEN 20071001 AND 20071231 ) AND MBNDCA.LAST_UPDT_DATE <= SYSTIMESTAMP) WHERE R = 1 AND R_FLAG = 'A') MBNDCA, I_CLAIMS IC WHERE UPPER (MBNDCA.N_A) = UPPER (IC.N_A) AND IC.TR_STATUS! =:TR_STATUS_R AND ( DECODE ('CL', :C_TYPE, IC.C_ID, NULL) NOT LIKE :C_CUSTOMER OR DECODE ('PDP', :C_TYPE_PDP, IC.C_ID, NULL) = :PDP_CUST OR DECODE ('M', :C_MED, IC.C_ID, NULL) BETWEEN :PDP_CUST AND :MAPD_CUST_END OR DECODE (:V_C_TYPE_CODE, :CTYPE_MAPD, IC.C_ID, NULL ) BETWEEN :MAPD_CUST_START AND :MAPD_CUST_END );
5|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.
Collection of Parameters 1) Object Level: Index Used- None. Tables Used (along with record count details). o I_claims- 1million rows in each partition. o M_n_audit- 10K records. Full Table Scans I_claims , M_n_audit Partition Accessed- All Partitions (#1-#6), data in #5th Partition. Functions used in join Columns- Upper. Partition Key Utilization- No (I_claims range partitioned on date_submitted) Access Predicates o UPPER (MBNDCA.N_A) = UPPER (IC.N_A) Filer Predicates o Step (2) in explain plan for view MBNDCA. o Step (4) in explain Plan for table I_claims. 2) Query Level:a. Cost Increased due to partition accessed fully i.e. all partitions getting accessed in query for Internal_claims table. b. Hash Join is applied due to presence of Big Table join with a comparative lesser record table, which intern results in FTS on both tables. c. Total Cost: 43K Total Execution Time Taken actual: 10 mins predicted Time is 511 secs. This above completes our Analysis of a Demo explain , which provide us most of the loop holes which needs to be fixed for tuning this query and bring down the response time. The analysis above will serve as an input while giving recommendations for tuning any such query.
Recommendations
So lets start analysing the results of explain plan, we should do it again in two steps one at query level and other at object level. As in above sample query:
do while using partition tables in there query. So Partition key should be utilized by mentioning the range in query. Query Level Recommendations:
(Evaluate Usage of functions) Also as upper is used even if a normal index is present optimizer will not expect it as it will expect a functional based index. So based on functionality and data in tables usage of upper should be evaluated and index should be created which interns remove Full Table Scan on both of tables.
6|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.
(Evaluate Joins) Hash join should be converted to nested loop or vice versa by evaluating the benefit in response time. (Evaluate Hints) Hints can be used to evaluate the usage on indexes if optimizer does not pickup newly index created, but as per regular practice hints should be removed while deploying code in to production/UAT, as it is expected that statistics should be generated well in production/UAT which will allow optimizer to use proper indexes. (Evaluate Nested Queries)On Table M_N_AUDIT Nested query exists based on rank on partition which may be required functionality , options has to find out to convert this functionality in to direct join of this table with I_claims in from clause.
We can get these recommendations by using the guidelines/Best practice for writing oracle sqls which can result in High performing queries. Some of the general Best practices for writing Sqls are: o Semi Joins(EXISTS and IN) Mr. Roger [3] says, If the main body of your query is highly selective, then an EXISTS clause might be more appropriate to semi-join to the target table. However, if the main body of your query is not so selective and the subquery (the target of the semi-join) is more selective, then an IN clause might be more appropriate. o Anti Joins (NOT EXISTS and NOT IN) with or Without Null. Mr. Roger [3] says, First consider how null values should be handled when deciding whether to use NOT EXISTS or NOT IN. If you need the special semantics provided by NOT IN, then your decision has been made. Otherwise, you should next consider whether or not the query might benefit from a merge or hash anti-join. If so, then you probably ought to choose NOT IN. If you decide to go with NOT IN but do not want the expression to evaluate to false if a null value is found, then make sure the subquery cannot return a null value. If there is no chance that the query will benefit from a merge or hash anti-join and the special semantics of NOT IN are not desired, then you might want to select the NOT EXISTS construct so that there is a better chance Oracle will perform an efficient filter instead of an inefficient nested loops anti-join. o Mr. Herv [4] says, o Avoid NOT in or NOT = on indexed columns. They prevent the optimizer from using indexes. Use where amount > 0 instead of where amount! = 0. o Avoid writing where project category is not null. Nulls can prevent the optimizer from using an index. o Consider using IN or UNION in place of OR on indexed columns. ORs on indexed columns cause the optimizer to perform a full table scan. o Avoid calculations on indexed columns. Write WHERE approved_amt > 26000/3 instead of WHERE approved_amt /3 > 26000. o Consider replacing outer joins on indexed columns with UNIONs. A nested loop outer takes more time than a nested loop unioned with another table access by index. o WHERE EXISTS sub-queries can be better than join if can you reduce drastically the number of records in driver query. Otherwise, join is better. o WHERE EXISTS can be better than join when driving from parent records and want to make sure that at least on child exists. Optimizer knows to bail out as soon as finds one record. Join would get all records and then distinct them! o Evaluate Views If a view joins 3 extra tables to retrieve data that you do not need, don't use the view!
7|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.
When joining 2 views that themselves select from other views, check that the 2 views that you are using do not join the same tables! Avoid multiple layers of view. For example, look for queries based on views that are themselves views. It may be desirable to encapsulate from a development point of view. But from a performance point of view, you lose control and understanding of exactly how much task loading your query will generate for the system. Look for tables/views that add no value to the query. Try to remove table joins by getting the data from another table in the join. Remove Unnecessary Sqls Overheads. Try to reduce no of joins in a query of a cursor, by seeing no of rows in output you need, ex... If the EMP table has 100000 rows and you are joining it with Dept table to get 2000 odd rows acted a group by to get.
Example: Declare Cursor c1 is select e.deptno,e.category,d.description,count(*) from dept d,emp e Where d.deptno= e.deptno group by e.deptno,e.category,d.description; Begin For xx in c1 loop --End loop; End; Alternative to above code: Declare Xdept Number: = -9999; Xdesc varchar2 (60); Cursor c1 is select e.deptno, e.category, count (*) from EMP EGroup by e.deptno, e.category; Cursor c2 is select d.description from dept d where d.deptno= xdept; Begin For xx in c1 loop --If Xdept! =xx.deptno Xdept: = xx.deptno; Open c2; Fetch c2 into Xdesc; End if; --End loop; End; Now 12 is no of distinct dept return by join in the query.
General Guidelines while coding Mr. Herv [4]: 1) Understand the data. Look around table structures and data. Get a feel for the data model and how to navigate it. 2) Do not code Large, complex plsql blocks, broke down them to smaller, simpler, self-contained blocks. 3) Try to see while making sqls columns referenced in queries are having indexes or not. These columns can be the select list columns and any required join or sort columns. 4) Consider adding small frequently accessed columns (not frequently updated) to an existing index. This will enable some queries to work only with the index, not the table. 5) IOTs and Index clusters usage should be checked.
8|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.
You can use and add on to this general guidelines while tuning queries of any application , as it varies the different aspects of sqls (including new features available with 11g and 10g )will be utilized. This completes the Query Tuning recommendations, after applying recommendations to application two final steps should occur: o Functional Testing of changes done after applying recommendations. o Query Improvement Gained analysis, which including analysing again the explain plan (w.r.t. Cost, Response time) and Overall Application Improvement needed gained or not. If not again the tuning cycle should start right from Generating explain plan. Once done with Above all steps, a query is said to be TUNED.
Conclusion
A query Tuning exercise does not only involve tuning of identified queries it is also important to execute all the steps mentioned in this paper to get the maximum improvement in response time. The framework mentioned above is getting used by many DBAs and Sql tuning specialists but an understanding of the existing engineering framework is developed her which is necessary for making effective contribution to the area of query optimization.
References
1. Whats Up with Dbms Stats? By Terry Sutton Database Specialists, Inc. 2. Speeding Up Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT EXISTS, IN, and NOT IN. Roger Schrag. Database Specialists, Inc. 3. http://www.iherve.com/oracle/tune100.htm, By Herv Deschamps.
Further Reading
This paper can be extended to include all best practices for Sql and PLSQL coding with Plsql tuning as well. Also Optimizer behaviour can be studied to evaluate different sets of queries based on Database parameter settings.
9|P a g e
Copyright 2008 Saurabh Kumar Mishra All rights reserved.