You are on page 1of 12

LATCH FREE WAIT CAUSES

BIND VARIABLE NOT IN USE HARD PARSES


SELECT * FROM v$sysstat

WHERE name like 'parse count%';

If parse count (hard) is close to 20% or more then application is not using bind variable
BUFFER CACHE CHAINING (UNDERSIZED BUFFER CACHE)
column event format a30 column username format a20 column state format a10 trunc column p1 format 999999999999 heading "P1" column p2 format 999999999999 heading "P2" column p3 format 99999 heading "P3" set lin 132 SELECT username, a.sid, a.event, a.seq#, a.seconds_in_wait, a.wait_time, a.p1, a.p2, a.p3, a.state, a.p1raw, a.p2raw, a.p3raw FROM V$SESSION_WAIT a, V$SESSION b WHERE a.sid = b.sid AND NOT(a.event like 'SQL%') AND NOT(a.event like '%message%') AND NOT(a.event like '%timer%') AND NOT(a.event like '%pipe get%') AND NOT(a.event like '%jobq slave wait%') AND NOT(a.event like '%null event%') AND NOT(a.event like '%wakeup time%') ORDER BY wait_time desc, event

/
You can also look at the column WAIT_TIME in the V$LATCH view to determine system wide if there is any significant latching occurring. Here is an example of a query against the V$LATCH view: The wait time is listed is in microseconds
column column column column column event format a30 Get/Miss Ratio format a50 Imm Get/Miss Ratio format a50 spin_gets format 999999999999 wait_time format 999999999999

select name,(misses/decode(gets,0,1,gets) )*100 "Get/Miss Ratio", (immediate_misses/decode(immediate_gets,0,1,immediate_gets))*100 "Imm Get/Miss Ratio",spin_gets, wait_time from v$latch where wait_time > 0;

REDOLOG BUFFER CONTENTION The initialization parameter log_buffer determines the size of the redo log buffer. The default is set to four times the maximum data block size for the host operating system.

The waits for the redo writing latch can be found in V$LATCH as seen in this query:
column name format a30 select name, gets, misses, wait_time from v$latch where name like '%redo%';

The following query determines the miss ratio and the "immediate" miss ratio for redo log latches.
Col MISS_RATIO format 999.99 Col IMM_MISS_RATIO format 999.99 SELECT substr(ln.name, 1, 20), gets, misses, immediate_gets, immediate_misses, (misses / decode(gets,0,gets+1,gets)) * 100 MISS_RATIO, (immediate_misses)/ decode(immediate_gets+immediate_misses,0, immediate_gets+immediate_misses+1, immediate_gets+immediate_misses) * 100 IMM_MISS_RATIO FROM v$latch l, v$latchname ln WHERE ln.name = 'redo allocation' AND ln.latch# = l.latch# / Table 1: Contention Statistics in the Redo Log Buffer

Statistic Name Redo buffer allocation retries

Meaning Indicates that a user process has had to wait for space in the redo log buffer. Increase the size of the redo log buffer if you see this latch causing wait issues. This might also imply an IO problem with the online redo logs (since the process of writing to the online redo logs can cause delays in allocating space to the redo log buffer). High values over a short period of time can indicate latching issues associated with the LGWR process. This might be a result of disk contention causing LGWR writes to the online redo logs to be delayed. Indicates sessions have waited for space to be allocated from the redo log buffer. High values can indicate latch contention for the redo logs. This could indicate an insufficient number of redo logs, or redo logs that are not sized properly. Additionally this can indicate IO problems with the disks that the redo logs are located on.

Redo writer latching time

Redo log space wait time

However most important are the wait times reflected in V$SYSTEM_EVENT. The following tables list the wait events you might see in V$SYSTEM_EVENT that are associated with redo contention issues:

Table 2: Wait Events seen in V$SYSTEM_EVENT

Wait Event Log buffer space

Meaning Could indicate that there is some form of contention occurring with the online redo logs. This might be disk contention, or perhaps system resource issues. This wait event might also imply that the redo log buffer is to small.

Indicates that a log file that Oracle tried to switch to was in need of archiving. This indicates that the ARCH process is having problems. To Log file switch solve this problem ensure that there is enough space in the archive log (archiving needed) destinations. Also make sure there is no IO contention for the online redo logs. Finally, consider increasing the number of ARCH processes. Log file switch (checkpoint incomplete) Log file sync This implies that there is some contention with regards to the DBWR process. Determine if there is an IO problem, and correct it. In some cases, if you dont have enough online redo logs available, this event may appear. Occurs after a commit or rollback when the redo log buffer is flushed to the online redo logs. If significant waits are seen for this event, consider reducing the size of the redo log buffer. Also consider batching your commits, and check for IO problems writing to the online redo log files.

If you have identified contention caused by the redo allocation latch, you can increase their number via the LOG_PARALLELISM parameter in Oracle 9i. This parameter allows the parallel redo generation and can increase the throughput of certain update-intensive workloads. This parameter is usually only used on high end servers that have more then 16 processors. Oracle Corporation recommends setting LOG_PARALLELISM to a value between 2 and 8 when running on systems with 16 to 64 processors. Setting LOG_PARALLELISM to values greater than 8 is not currently recommended. col name format a15 col WILLING_TO_WAIT format 999.99 col NO_WAIT format 999.99 select name, gets, misses, (misses/gets) * 100 WILLING_TO_WAIT, immediate_gets, immediate_misses,(immediate_misses/immediate_gets) * 100 NO_WAIT from v$latch where name = 'redo copy'

If you do see a large number of gets as opposed to immediate gets for the redo copy latch, this may suggest that the size of the redo log files are too small and log switches are occurring too frequently. A checkpoint occurs at every log file switch and redo log buffers are written to the redo log files. One indicator that this is the case is if there are checkpoint not complete messages in the alert log. Another way to determine if you are having contention on the redo copy latch is via the following query. This query will show you both Background checkpoints started and completed. The values for these statistics should ideally be equal. If they are not, then a checkpoint was started before the previous checkpoint had a chance to complete. select * from v$sysstat where name like 'background checkpoint%';

BUFFER CACHE CHAIN

Contention on this latch is caused by very heavy access to a single block. If there is heavy contention on data blocks, you will see session wait event "buffer busy wait" in v$session_wait. If you have a cache buffer chain latching problem, it will be identified by the following query:
SELECT name, gets, misses, sleeps, immediate_gets, immediate_misses FROM V$latch WHERE name = 'cache buffers chains';

v$segment_statistics table can be used to identify troublesome sessions as well. Here is a query that you would use to identify the troublesome objects
Select owner, object_name, tablespace_name, statistic_name, value From v$segment_statistics Where statistic_name=buffer busy waits;

BUFFER CACHE UNDERSIZED


col "Buffer Cache Hit Ratio" format a23 select to_char((sum(decode(name, 'consistent gets',value, 0)) + sum(decode(name, 'db block gets',value, 0)) sum(decode(name, 'physical reads',value, 0)) sum(decode(name, 'physical reads direct',value, 0))) / (sum(decode(name, 'consistent gets',value, 0)) + sum(decode(name, 'db block gets',value, 0))) * 100,'999.99') || ' %' "Buffer Cache Hit Ratio" From v$sysstat st ;

If it appears that the buffer cache will benefit from being made larger, then certainly consider increasing the size of the database buffer cache. Indications that the buffer cache is stressed, and perhaps needing resized, are high wait values for these events:
o o o

Free buffer waits Buffer deadlock Buffer busy waits

DATABASE LOCKING
If you would like to know which sessions are blocking and which objects they are blocked on, a simple query can provide this information as seen in this example: Column username format a15 Column owner format a15 Column file_name format a30 Column object_name format a20 SELECT a.sid, a.serial#, a.username, b.owner, b.object_name, c.tablespace_name, c.file_name from v$session a, dba_objects b, dba_data_files c where a.row_wait_obj#=b.object_id

and a.row_wait_file#=c.file_id and a.lockwait is not null;

REDOLOG CONFIGURATION
SELECT TO_CHAR(first_time, 'mm/dd/yyyy hh24:mi:ss') FROM v$log_history WHERE first_time < sysdate - 1 ORDER BY recid

This query essentially gives us the time of each log switch. If you frequently see gaps of less than fifteen minutes between each log switch you should consider increasing the size of your online redo logs. Extrapolate from the current size and switching frequency how large to make them. For example, if your online redo logs are 500k and you switch every 5 minutes, you should probably make them 1500k so they will switch every 15 minutes. Further, if your database is switching at a rate greater than once every fifteen minutes (say every 30 minutes) you will want to downsize the online redo logs to a size where they will be switching every 15 minutes. IO PROBLEMS WITH REDOLOG
select event, total_waits, time_waited from v$system_event where event in ('LGWR wait for redo copy','log switch/archive', 'log file sequential read','log file single write', 'log file parallel write','log file switch (checkpoint incomplete)', 'log file switch (archiving needed)','log file switch (clearing log file)', 'switch logfile command','log file switch completion','log file sync', 'STREAMS capture process waiting for archive log') order by time_waited desc;

High wait times shown in the TIME_WAITED column indicate a problem with the redo logs. Typically if you see high wait times in this query its going to indicate either an IO problem or a problem moving archived redo logs to another device or system (which can indicate network bandwidth issues). It's important to keep everything in prospective. If your system has been up for 4 weeks and you see three hours of waits for a given event, that's probably not that big of a problem. The log file sync wait may indicate a problem with the size of the redo log buffer. If you see large waits on this event, you may want to either make the buffer larger or smaller. It's hard to tell sometimes which will result in better performance. Generally we recommend that you configure the log buffer somewhere between 64k to 128k and tune up from there if that is required.

From 10g onwards, there is a lot more to do about the initialization parameter log_buffer. Also, Metalink note 351857.1 states that the size of the log buffer cannot be changed with this paramter from 10gR2, instead, the size will be set by Oracle.

SEQUENCES
There will be performance benefits when a heavily-used sequence is cached. This can be tested below with a simple PL/SQL block:
Create sequence mycached_seq start with 1 increment by 1 cache 1000000; Set timing on; declare xyz number; i number; begin for i in 1..100000 loop select mycached_seq.nextval into xyz from dual; end loop; end; / PL/SQL procedure successfully completed. Elapsed: 00:00:10.38

Now try the same anonymous code with a NOCACHE sequence:


Create sequence mynocached_seq start with 1 increment by 1 NOCACHE; Set timing on; declare xyz number; i number; begin for i in 1..100000 loop select mynocached_seq.nextval into xyz from dual; end loop; end; / PL/SQL procedure successfully completed.

Elapsed: 00:03:57.44 WHEN NOT TO CACHE SEQUENCE The only reasons not to cache sequence numbers are: 1. If sequence numbers absolutely cannot be skipped. 2. The shared pool is too crowded and there are physical memory constraints Sequences are rarely used and there are no performance expectations. DBMS_SHARED_POOL.KEEP TO PIN SEQUENCE

If there is a risk of losing sequence numbers because of aging or frequent shared pool flushing, then you may want to keep it. Heres how: 1. Reference an existing sequence. For example:
select myseq.nextval from dual;

2. Verify that the dbms_shared_pool package has been created. If it does not exist in the database, login as sys and run the $ORACLE_HOME/rdbms/admin/dbmspool.sql script. 3. Then, execute the following command to keep the sequence:
exec sys.dbms_shared_pool.keep('myseq', 'q')

4. Verify that the sequence has been kept by querying v$db_object_cache:


select owner, name, namespace, sharable_mem, executions, kept from v$db_object_cache where type = 'SEQUENCE'; OWNER KEP -----------ME YES PERFSTAT NO NAME NAMESPACE SHARABLE_MEM EXECUTIONS

-------------------- -------------------- ------------ ---------MYSEQ STATS$SNAPSHOT_ID TABLE/PROCEDURE TABLE/PROCEDURE 18769 18886 0 0

Of importance here is the value of the SHARABLE_MEM column, which indicates the amount of memory used; and the KEPT column, which indicates if the sequence is pinned in the shared pool. Also, PINS indicates the number of sessions executing the sequence, while LOCKS indicates the number of sessions currently locking the sequence.
Sequences should be kept when it is important that they do not age out or get flushed, either for performance reasons or to reduce the likelihood of gaps in the sequence numbers.

SORTS TO REDUCE I/O


A sort area in memory is used to sort records before they are written out to disk. Increasing the size of this memory by increasing the value of the initialization parameters SORT_AREA_SIZE or PGA_AGGREGATE_TARGERT, lets you sort more efficiently. Consider using a two megabyte SORT_AREA_SIZE when your sorted data exceeds 100 megabytes in size. Because SORT_AREA_SIZE is allocated per user, increasing this parameter can exhaust memory very quickly if a large number of users are logged on. select name, value from v$sysstat where name like 'sort%';

To determine if PGA_AGGREGATE_TARGET is set correctly, you can use the V$PGASTAT view. An example of the output of a query against V$PGASTAT might look like this:
NAME VALUE ---------------------------------------------------------aggregate PGA target parameter 524288000 aggregate PGA auto target 163435776 global memory bound 25600 total PGA inuse 9353216 total PGA allocated 73516032 maximum PGA allocated 698371072 total PGA used for auto workareas 0 maximum PGA used for auto workareas 560744448 total PGA used for manual workareas 0 maximum PGA used for manual workareas 0 over allocation count 0 total bytes processed 3.0072E+10 total extra bytes read/written 2.1517E+10 cache hit percentage 65.97 UNIT ------bytes bytes bytes bytes bytes bytes bytes bytes bytes bytes bytes bytes percent

In particular the following statistics are of interest. The aggregate PGA target parameter displays the current setting of the PGA_AGGREGATE_TARGET parameter. The aggregate PGA auto target value gives the amount of PGA memory that Oracle can use for work areas. This is a derived value and low values indicate that there is little available memory for sort areas. If this number is low, you should consider setting the PGA_AGGREGATE_TARGET parameter higher. The global memory bound statistics is another work area that is dynamically set by Oracle, and increases or decreases as workload changes. If this value falls below 1MB then you should increase the PGA_AGGREGATE_TARGET parameter. If the total PGA allocated parameter exceeds PGA_AGGREGATE_TARGET frequently, then this means that Oracle has to actually allocate more memory to the private work areas than was expected. If this is the case then PGA_AGGREGATE_TARGET should be increased. Further if the over allocation count is large, then this indicates that the PGA_AGGREGATE_TARGET is too small. CONSIDERATION TO REDUCE SORTS You may not always realize that your program statements invoke a sort. Sorting is performed by the following statements:

CREATE INDEX GROUP BY ORDER BY INTERSECT MINUS UNION DISTINCT

Unindexed table joins Some correlated subqueries

Your goal should be to always reduce or eliminate sorts completely. Again, this starts at the application layer. Here is a list of suggested things you can look for in application code, which might be causing unneeded sort operations. While these different operations may well be needed, its a good idea to review your code and make sure. 1. Avoid using the DISTINCT clause unless necessary. 2. Use the UNION ALL clause in place of the UNION clause unless duplicates need to be eliminated. 3. Try to use HASH JOINS instead of SORT MERGE JOINS. The use of hints will cause the optimizer to choose this join. 4. Use appropriate index hints to avoid sorts. 5. The cost based optimizer will try to avoid a sort operation when the FIRST_ROWS hint is used. 6. Make sure that your SQL query is taking advantage of the best available indexing options. 7. Review operational SQL code for unneeded sort operations, such as order by clauses. FULL TABLE SCANS
The following query reports how many full table scans are taking place:
SELECT name, value FROM v$sysstat WHERE name LIKE '%table %' ORDER BY name;

If the number of long table scans is significant, there is a strong possibility that SQL statements in your application need tuning or indexes need to be added.

If you can identify the users who are experiencing the full table scans, you can find out what they were running to cause these scans. Below is a script that allows you to do this:
DROP VIEW full_table_scans; CREATE VIEW full_table_scans AS SELECT ss.username || '(' || se.sid || ') ' "User Process", SUM (DECODE (NAME, 'table scans (short tables)', VALUE)) "Short Scans", SUM (DECODE (NAME, 'table scans (long tables)', VALUE)) "Long Scans", SUM (DECODE (NAME, 'table scan rows gotten', VALUE)) "Rows Retrieved" FROM v$session ss, v$sesstat se, v$statname sn WHERE se.statistic# = sn.statistic# AND ( NAME LIKE '%table scans (short tables)%' OR NAME LIKE '%table scans (long tables)%' OR NAME LIKE '%table scan rows gotten%' ) AND se.sid = ss.sid

AND ss.username IS NOT NULL GROUP BY ss.username || '(' || se.sid || ') '; COLUMN "User Process" FORMAT a20; COLUMN "Long Scans" FORMAT 999,999,999; COLUMN "Short Scans" FORMAT 999,999,999; COLUMN "Rows Retreived" FORMAT 999,999,999; COLUMN "Average Long Scan Length" FORMAT 999,999,999; TTITLE ' Table Access Activity By User ' SELECT "User Process", "Long Scans", "Short Scans", "Rows Retrieved" FROM full_table_scans ORDER BY "Long Scans" DESC;

ARCHIVING PERFORMANCE
Follow the steps below to alleviate archive bottlenecks:

Ping-pong your redo logs from disk to disk, with redos 1 and 3 on disk A and redo logs 2 and 4 on disk B. This will allow the ARCH process to read from one disk and the LGWR to write to a separate disk. Archive logs should not be on the same disk as active data files or redo logs. In the case of multiplexed logs, try to multiplex such that no two members are on the same disk and they are "ping-ponged" so that while a log member is being written another isn't being read for archive at the same time on the same disk. Increase the initialization parameter LOG_ARCHIVE_BUFFER_SIZE to a value such as 256 or 512 kilobytes. The parameter has a default on most machines of about 150 operating system blocks, which is often inadequate for a heavily used production system. If you are archiving to tape, consider archiving to disk and then copy to tape in background. You may also consider compressing your archive logs (however, not the one currently being written to) to save disk space if it is at a premium. Place your archive log files onto a higher-speed disk. Increase your redo log sizes. Note however that if you lose your instance, you may be off the air for quite some time while your current redo log is being read to ensure that the database is intact. Avoid performing hot backups at the same times as overnight jobs - in particular at the same time that batch updates are occurring. Entire blocks are written to your log buffer, redo logs, and, eventually, archive logs, for the duration of the hot backup on the data file being written to. If you must have 24-hour uptime, coordinate your backups so that there is little activity on the data file being backed up.

ROW CHAINING AND MIGRATION The following query can be used to identify tables with chaining problems:
TTITLE 'Tables Experiencing Chaining' SELECT owner, table_name, NVL(chain_cnt,0) "Chained Rows" FROM all_tables WHERE owner NOT IN ('SYS', 'SYSTEM')AND NVL(chain_cnt,0) > 0 ORDER BY owner, table_name;

The above query is useful only for tables that have been analyzed. Note the NVL function to replace a NULL with a zero -- tables that have not been analyzed will appear to have been.\ The following steps explain how to list all of the chained rows in any selected table: 1. Create a table named CHAINED_ROWS using the following script (taken from Oracle's utlchain.sql script):
CREATE TABLE chained_rows ( owner_name VARCHAR2(30), table_name VARCHAR2(30), cluster_name VARCHAR2(30), partition_name VARCHAR2(30), subpartition_name VARCHAR2(30), head_rowid ROWID, analyze_timestamp DATE );

2. Issue the ANALYZE command to collect the necessary statistics:


ANALYZE TABLE <table_name> LIST CHAINED ROWS;

3. Query the CHAINED_ROWS table to see a full listing of all chained rows, as shown below:
SELECT * FROM chained_rows WHERE table_name = 'ACCOUNT'; Sample Output: Owner_name Table_Name Cluster_Name Head_Rowid Timestamp ----------------------------------------------------------------QUEST ACCOUNT 00000723. 0012.0004 30-SEP-93 QUEST ACCOUNT 00000723. 0007.0004 30-SEP-93

The following is an example of how to eliminate the chained rows:


CREATE TABLE chained_temp AS SELECT * FROM <table_name> WHERE rowid IN (SELECT head_rowid FROM chained_rowS WHERE table_name = '<table_name>'); DELETE FROM <table_name> WHERE rowid IN (SELECT head_rowid FROM chained_rows WHERE table_name = '<table_name>'); INSERT INTO <table_name> SELECT * FROM chained_temp;

4. Drop the temporary table when you are convinced that everything has worked properly.
DROP TABLE chained_temp;

5. Clean out the CHAINED_ROWS table:


DELETE FROM chained_rows WHERE table_name = '<table_name>';

You might also like