You are on page 1of 29

Informix Tuning

Contents
[hide]

1 Introduction

2 Tuning Checkpoints

2.1 How Informix Deals With Disk Data

2.1.1 The Simple: Foreground Writes

2.1.2 The Smart: LRU Queues & Buffers

2.1.3 The Brute: Checkpoints

2.2 The Wise: Making The Best of Checkpoints, Foreground Writes & LRUs

2.2.1 Monitoring LRU activity and Estimating the Parameters

2.2.1.1 onstat -R

2.2.1.2 onstat -F

2.2.1.3 onstat -p

2.3 The CKPTINTVL - Making Checkpoints Less Frequent

3 Tuning The Data Dictionary (DD)

3.1 Hashing Overview

3.2 Relevant ONCONFIG Parameters

3.2.1 The DD_HASHSIZE Parameter

3.2.2 The DD_HASHMAX Parameter

3.2.3 Recommended Values

3.3 Monitoring The Data Dictionary

3.4 References

4 Tuning Memory Access

4.1 Basics Of Memory Access

4.2 The VP_MEMORY_CACHE_KB Parameter (IDS 10)

4.2.1 Recommended Configuration

4.2.2 Monitoring the VP Memory Use

4.3 The DS_NONPDQ_QUERY_MEM Parameter (IDS 10)

4.4 References

5 Tuning Update Statistics & Effects of Parallel Data Query (PDQ)

5.1 Recommended Method For Basic Update Statistics

5.2 PDQ, Parameters & Appropriate Usage

5.2.1 ONCONFIG Parameters


5.2.1.1 MAX_PDQPRIORITY

5.2.1.2 DS_MAX_QUERIES

5.2.1.3 DS_TOTAL_MEMORY

5.2.1.4 DS_MAX_SCANS

5.2.1.5 DS_NONPDQ_QUERY_MEM (IDS 10)

5.2.2 Example PDQ Configuration

5.2.3 The PDQPRIORITY Setting

5.3 Optimizing The Update Statistics Process

5.3.1 The rrt_update_statistics.sh Script

5.3.2 DBUPSPACE Parameter

5.4 References

6 BTREE Cleaner Optimization

6.1 Parameters Controlling The BTREE Cleaner

6.1.1 Number of Threads

6.1.2 Priority

6.1.3 Threshold

6.1.4 Range Size

6.2 Adjusting a Live Database

6.3 Onconfig Configuration (IDS version 10 and newer only)

6.4 Recommended Production Operation

6.5 ALICE Mode - IDS Version 11

6.6 References

7 Virtual CPU Control

7.1 Planning The Number of VP CPUs

7.1.1 Changing The Number Of VP CPUs

7.2 Monitoring The Number of Asynchronous Input/Output Workers (AIO)

7.2.1 Changing The Number Of VP AIO Workers

Introduction
Tuning a database server, not just Informix, is a major undertaking. It requires
familiarity with:

Type of application the database server is used with


Type of hardware the server is running on
Volume of transactions
The need for specific types of recovery and load balancing
And much more.
This document only attempts to gather the information RRT has used in our
installations of the Informix IDS 9.4 and IDS 10 (soon IDS 11) database. Some
ideas in this document may not apply to every environment, however they are
worth considering in any case, to make sure that you don't spend time solving
something which already has an existing solution.

Tuning Checkpoints
In IDS version 10 and older, checkpoints can cause visible system performance
issues. The most common manifestation of a checkpoint problem are long wait
times to confirm perform write operations in the system.
Some examples of this type of transactions in the RRT application are:

Confirming new bookings.


Modification of existing bookings.
Maintenance of fares and flights.
DCS operations that change the passenger status (Check-in, Boarding,
Printing).
Changes in bookings not reflected in the DCS system.
etc.
The basic problem with checkpoints is that all operations on the system are
suspended while a checkpoint is running. This is not a problem if the checkpoint
lasts 20 milliseconds, however, on a system not tuned for the type of operation
the RRT application performs, the checkpoints can take up to 80-90 seconds,
which will cause serious problems in the live operation.
In IDS 11, IBM has addressed this problem, where the checkpoints do not cause
other transactions running on the database to be suspended. However, IDS 10 &
9.4x still deal with this problem.

How Informix Deals With Disk Data


In order to understand the reason why checkpoints take a long time, the concepts
on how Informix uses RAM to deal with disk reads and writes is briefly reviewed in
the following sections.
The Simple: Foreground Writes
This is the easiest one to understand, and also the one that needs to be avoided
the most. This is the method where each client session into the database simply
writes directly to disk whatever it is modifying.
So if one session is inserting a row into a table, the data on disk that contains the
information about that table is changed by the database server. If another
sessions is deleting rows, the rows are deleted on disk immediately.
Now, imagine a database server with 10,000 sessions all modifying data, the disk
will quickly get overloaded with seek/write requests. As most SQL operations alter
relatively small amounts of data (10-100 bytes per operation), the physical disk
will waste a lot of it's time dealing with small operations.
This type of write operations is chosen by the database server when it can not
use the memory buffered method described in the LRU section. This is most often
caused by too little memory (LRU buffers) being allocated for the database engine
to use.
Please consult the IBM documentation on IDS for estimates on the LRU buffers
allocation.
The Smart: LRU Queues & Buffers
While this topic is best explained in the IBM IDS manuals, we provide a brief
general overview in this document, so that the checkpoint tuning can be properly
understood.
The LRU queues are used to manage system memory (RAM) reserved for
keeping disk data (called pages) in RAM for faster access. For read only
operations, the LRU queue simply keeps in pages that have been read from disk,
so that next time they are needed they can be retrieved from RAM instead of disk.
The LRU queues are also used to keep data modifications, instead of writing the
changes directly to disk. This is done so that multiple disk writes can be grouped
together and done at once, taking advantage of optimizations in the operating
system and disk hardware related to larger data operations.
When an application connected to the database updates a row, or inserts a row,
the database doesn't immediately write that change to disk. Instead, it puts it in an
LRU queue so that it can be written later on.
The database will write the modified pages from the LRU queues to disk based on
parameters specified in the ONCONFIG file. We will discuss the parameter as it is
specified in IDS 10, however the IDS 9.4x parameters are configured with the
same principles, they are only named differently.
The parameter used to control the LRU processing is called BUFFERPOOL. A
sample value could be:

BUFFERPOOL
size=2k,buffers=10000,lrus=10,lru_min_dirty=1.0,lru_max_dirty=10.00

The values in the parameter are used to control how and when the database
engine will use the memory area allocated by the LRU buffers.

size=2k - represents the page size that this buffer will be used for. IDS allows
multiple page sizes to be configured in the system, so in those cases the
BUFFERPOOLS will have to be defined for each page size. For the purposes
of this document we will assume a system with one page size, namely 2K.
buffers=10000 - represents the number of pages that will be kept in RAM. The
hardware needs to have enough physical RAM to accommodate this amount
of memory. In this case, 20MB will be allocated (10000 * 2K page)
lrus=10 - the number of groups the database will separate the 10000 buffers.
Each group will be handled by a separate worker that will read/write pages
to/from disk. This number can be tweaked, but it is usually set to values
related to the number of disks the database is stored on. With RAID and SAN
configurations, this physical disk information is obscured, so the value in this
parameter is set between 10-50 depending on the size of the hardware
configuration. Start with 20 and tune from there.
lru_max_dirty=10.00 - This parameter represents a percent value based on
the number of pages specified in the buffers parameter. When the number of
dirty pages (pages that contain modified data ready to be written to disk)
reaches this percent value, the database will start writing those dirty pages to
disk. For example: In a 10000 buffer configuration, 10.00% equals 1000
pages. So when the application updates/deletes/inserts enough data to use up
1000 pages, the database engine will start writing those pages to the physical
disk. Any number less than 10% of the buffers parameter will stay as dirty
pages in the LRU memory.
lru_min_dirty=1.00 - Like the lru_max_dirty parameter, this is a percentage
value based on the buffers number. This percentage value controls when the
database engine will stop writing the dirty pages from the LRU memory onto
disk. So, to follow up from the example in the lru_max_dirty example, 1% of
10000 pages is 100 pages. So when the LRU reaches 1000 dirty pages, the
database will start writing those to disk. Once it writes 900 pages, and the LRU
dirty page count is down to 100, the database will stop writing the pages to
disk and it will leave them in the LRU memory.
The Brute: Checkpoints
Checkpoints essentially do the same thing as LRUs, however they do it on a
regular interval set by the onconfig parameter CKPTINTVL. The parameter
specifies the number of seconds at which the database will do an automatic
cleanup of the dirty pages in the LRU memory area. The default value of this
parameter is 300, or 5 minutes.
This means, that if the application doesn't modify enough pages to trigger the
LRU writing of dirty pages to disk, lru_max_dirty, then the checkpoint is left
responsible to write those pages to disk. This is a problem as the checkpoint
operation will suspend all other operations on the database, while it's cleaning up
the dirty pages in the LRU write buffers.
The LRU dirty page cleanup does not cause this suspend condition, so it's
best to make sure the checkpoint is not left dealing with the dirty LRU
pages.
NOTE: There are other conditions that trigger checkpoints, please look at the info
provided in the IBM online manuals for IDS. This document is addressing the
problems caused by LRU activity so only that aspect of the checkpoints will be
discussed.

The Wise: Making The Best of Checkpoints,


Foreground Writes & LRUs
Putting all of the above together, the database engine will collect modified pages
until it reaches the lru_max_dirty % of buffers (1000 in the examples above).
Once that value is reached, the pages will be written to disk until the number of
dirty pages reaches the lru_min_dirty % of buffers (100 in the examples above).
On systems with large amounts of RAM (16GB +) the number of pages allocated
to the LRU buffer can be 2 million and above. With that number of buffer pages, a
10% lru_max_dirty setting would require 200,000 pages (400MB of data) to be
modified before the LRU cleanup will start.
With a 5 minute checkpoint interval, it is more likely that the checkpoint will end
up dealing with the dirty pages, thus causing a long checkpoint, and a locked up
system. Not good.
We need to make sure the LRU cleaners are doing their job before the checkpoint
starts up.
Monitoring LRU activity and Estimating the Parameters
The best way to determine the % for the LRU min/max values is to monitor the
system using the onstat -R, onstat -F and onstat -p commands.
onstat -R
The first command will show the number of dirty pages on the system at the end
of it's output:

$ onstat -R | tail -4
9916 dirty, 2000000 queued, 2000000 total, 2097152 hash buckets, 2048
buffer size
start clean at 30.000% (of pair total) dirty, or 12500 buffs dirty, stop
at
20.000%

Monitor this command for a few minutes, while watching the online.log file in
another terminal sessions. That way you will see how many dirty pages there are
when the checkpoint operation starts working. You will recognize the checkpoint
operation working when the online.log output shows something like this:

$tail -f $INFORMIXDIR/*/online.log
20:11:09 Checkpoint Completed: duration was 3 seconds.
20:11:09 Checkpoint loguniq 32069, logpos 0x13a0018, timestamp: 0xbd88b117

Note the number of dirty pages from the onstat -R command, when the
checkpoint entry shows up in the online.log file. That number can be used to
determine what is the maximum % value in the lru_max_dirty parameter.
Here we see that there are 9916 dirty pages waiting to be written to disk, and
there are total of 2,000,000 pages in the LRU buffer memory. That is 0.4958%.
So to trigger the LRU to clean up this before the checkpoint gets to it, the
lru_max_dirty setting would have to be at most 0.48.
To make sure we don't get too close to the checkpoint, 1/2 of that value is
probably a safer setting to begin with, so something like 0.25 is a better first try.
There is an important point to be noted here. The % used to determine when the
engine starts writing the pages from an LRU is applied to the individual LRU
queues, not to the overall buffer pool.
So for the above example, with 48 LRU queues and 2 million pages, each queue
will get about 41,600 pages. So the 0.25% value at which an individual queue will
be flushed is 41,600 * 0.25% = 100 pages. So any queues that manage to stay
under 100 pages will not be flushed, and will be left for the checkpoint to take
care of.
If the 48 queues are all at 96 dirty pages, that can still leave 4600 dirty pages for
the checkpoint to deal with. So keep that in mind when dealing with the
optimization of these values.
onstat -F
The second command will show how many times the system used the LRU
automatic processing of dirty pages (good) and how many times it had to let the
checkpoint do the cleanup (not good)

$ onstat -F

IBM Informix Dynamic Server Version 10.00.FC6 -- On-Line (Prim) -- Up 3


days 10:39:36 -- 8576004 Kbytes

Fg Writes LRU Writes Chunk Writes


0 0 3880512

FG Writes (see section on Foreground Writes) are even worse than the LRU
Writes so it's good there is 0 of those. However, in this example, the LRU writes
are also 0, and the Chunk Writes (checkpoint writes) are the largest number,
meaning the automatic LRU cleanup is not kicking it at all.
The BUFFERPOOL setting on the system the above examples are taken from is
as follows:

BUFFERPOOL
size=2k,buffers=2000000,lrus=48,lru_min_dirty=20,lru_max_dirty=30

Meaning that the LRU cleanup will only kick in at 30% dirty pages which is about
600,000 pages, or 1.2GB of changed data in 5 minutes. Bottom line: the
checkpoint is left with all the work.
To make sure the system doesn't leave the checkpoint with all the dirty pages, the
BUFFERPOOL setting would be better at:

BUFFERPOOL
size=2k,buffers=2000000,lrus=48,lru_min_dirty=0.0,lru_max_dirty=0.25
so that the cleanup is done outside of the checkpoint processing, and the system
is not suspended.
The lru_min_dirty is set at 0, so that once the LRU cleaners start they handle all
of the outstanding dirty pages in the buffers. This setting can be changed if the
system is spending too much time flushing the LRUs, however it is a good starting
value.
onstat -p
This command will report several values about the operations of the database
engine. The ones important for this document are the numckpts and the
ckpwaits.

$ onstat -p

IBM Informix Dynamic Server Version 10.00.FC6 -- On-Line (Prim) -- Up 3


days 15:53:43 -- 8576004 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits
%cached
53792905 143808046 68046147018 99.92 8099807 30614827 548449520
98.65

isamtot open start read write rewrite delete


commit rollbk
91907689451 831763342 14666737935 29434250343 76476630 5130793
3928657 2943475 2552

gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs


0 0 0 0 0 0 0

ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes


0 0 0 429435.66 11581.81 1145 2294

bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress


seqscans
2593652 18309 1788065493 44 0 9249 38788022
1139963450

ixda-RA idx-RA da-RA RA-pgsused lchwaits


5664156 279570 3368245 9310233 315895484

numckpts - is the number of checkpoints performed since the IDS instance


was started.
ckpwaits - is the number of times a users session had to wait for the
checkpoint to complete.
The ckpwaits parameter is the one to be mindful of, as it shows how often the
checkpoints affect users sessions against the database server. The lower this
number is the better optimized the server is for the type of use it is under.

The CKPTINTVL - Making Checkpoints Less


Frequent
The CKPTINTVL parameter controls the number of seconds between the
recurring executions of the checkpoint operation performed by the database
engine.
The discussions above all revolve around making sure that the LRU cleaners
leave no dirty pages in the LRU buffers by the time a checkpoint is executed
every CKPTINTVL seconds.
You can increase the CKPTINTVL parameter to a larger value, therefore leaving
more time for the LRU dirty pages to accumulate, and more time for them to be
cleaned out.
This change should be done carefully as the checkpoint operation makes sure
that the database memory state is in sync with the disk storage. Checkpoints
done too far apart will cause longer recovery times after an abnormal shutdown
(database crash, hardware failure, etc).
After the LRU settings are set properly and proven in a production environment,
this setting can be changed to cause the checkpoints to occur less often than 5
minutes. There is nothing wrong of terribly dangerous with changing the setting, it
just needs to be done in a controlled manner and with proper monitoring so that
the resulting benefits are clear.
Even if this parameter is set to a larger value than 5 minutes (300 seconds) the
database engine will not be left without checkpoints being executed. The engine
still executes checkpoint operations other than the ones run at regular intervals.
Some conditions that trigger checkpoints are:

The physical log filling up over 75%


The next logical log to be used contains the last checkpoint record in it.
Adding a dbspace or a chunk.
Clean shutdown of the database instance.
Backup operations
Tuning The Data Dictionary (DD)
The IDS engine uses a table schema caching feature called "Data Dictionary"
(DD). This feature ensures the layout information about the tables in the database
is kept in memory so it doesn't have to cause disk activity when the application
queries use the database tables.
The DD cache is implemented as a list of lists, so that when the application
queries need to lookup if the table's schema is in memory, they scan the
dictionary lists. If the table information is in memory, it is used, if it's not in
memory, it is read from disk, put in the dictionary cache, and then used from
memory.
The number of lists in the DD and the number of elements in each individual list
(called bucket), is controlled by 2 parameters in the onconfig file. These
parameters are usually not present in the basic configuration, so the database
uses their default values.
Informix uses a hashing algorithm to decide which table will be placed in which
bucket. This hashing feature allows the lookup of tables in the DD to happen
quickly, by making sure that the bucket containing the table information can be
determined easily.

Hashing Overview
An example of hashing would be to put all tables in a memory bucket based on
the first letter of the table name. So all tables starting with 'a' would be in the
a_bucket, starting with 'b' in the b_bucket, and so on.
So when an application query needs to use the column names of a table named
'airports', it will know which bucket to go into and look for the cached table
schema.
The hashing method of lookup has the advantage of a quick determination of the
bucket containing the data. However, once the bucket is determined, finding the
actual table withing the bucket is a simple comparison of the names of the tables
in that bucket.
Even with this simple hashing example, the problem becomes obvious in cases
where most of the tables in the system are named with names starting with the
letter 'a':

In this case, the a_bucket will end up containing many tables in it.
To find if a table is actually in the a_bucket, a name comparison will have to be
done many times.
This is a very expensive operation when queries are expected to execute in
tens of milliseconds.
So making sure the buckets stay small is very important. This is where the DD
onconfig parameter become useful.

Relevant ONCONFIG Parameters


The following parameters can be used to controls the size and behavior of the
DD. Changing these parameters requires a restart of the IDS instance.
The DD_HASHSIZE Parameter
The DD_HASHSIZE parameter controls the number of lists that will be used to
store the cached table schema information. The default value for this parameter is
31.
For RRT's application, 31 is not large enough, so increasing this value is strongly
recommended. Keep in mind that this value has to be a prime value so when
tuning it make sure you select a prime value to use.
We recommend starting with 1051 as a value for this parameter.
The DD_HASHMAX Parameter
The DD_HASHMAX parameter controls the number of entries each bucket will be
allowed to contain. The default value is 10. As with the DD_HASHSIZE
parameter, this value is not optimal for the RRT application.
We recommend starting with 4 as a value for this parameter.
Recommended Values
An example of what the settings should look like inside the onconfig file:

DD_HASHSIZE 1051
DD_HASHMAX 4

Monitoring The Data Dictionary


To monitor the layout of the DD hash buckets in the database engine, use the
onstat -g dic command. The command shows a detailed usage of the DD usage,
however the following command will show you the important details:

$ onstat -g dic | grep -i dict


Dictionary Cache: Number of lists: 31, Maximum list size: 10
Total number of dictionary entries: 375

The important part is to make sure that the DD is not filling up to over it's
configured capacity. If the DD is using up all the slots() in all the buckets() it will
have to remove tables from the cache in order to introduce the new ones being
requested.
In the above example configuration, the DD is setup to be able to handle up to
310 entries, however because of the type of use the system is under, there are
375 entries in the DD cache.
This is a very inefficient scenario and should be avoided as much as possible.
When increasing the DD parameters in the onconfig, work on increasing the
DD_HASHSIZE parameter. Making the DD_HASHMAX larger will make the
buckets contain more tables in them, which will cause the lookup of individual
tables by name to slow down.

References
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.perf.doc/perf1
12.htm

Tuning Memory Access


As of IDS 10.xC5 a new feature was added which improves how the CPUs
allocated to the database engine interact when using memory allocation.

Basics Of Memory Access


When the individual engine virtual processors (VP) execute the queries from the
application, they allocate memory for the execution of the query to work with.
This memory comes from the engine's global shared memory pool, and it is
managed using a bitmap representing the free and used memory pages. In order
for a VP to allocate memory for the execution of a query it has to do the following:

Lock the bitmap so no other VP is accessing it.


Search the bitmap for free memory pages.
Mark the free pages as used.
Unlock the bitmap.
Once the VP is done with the query execution, it has to repeat the same steps to
return the memory pages to the shared memory pool.
On a busy system with many CPUs allocated to the database, the above
operations can cause many of the VPs to have to wait for a long time before they
can execute the queries with the memory they need.

The VP_MEMORY_CACHE_KB Parameter (IDS 10)


In order to alleviate this condition, the database can be configured to allow each
VP to keep its own set of memory pages that it can use and re-use for execution
of queries.
This way, the VPs do not have to compete for the single common shared memory
pool bitmap, except in cases when their own pool is not enough for the execution
of the query.
The new setting, VP_MEMORY_CACHE_KB specifies, in KB, the amount of
memory each VP is allowed to keep for it's own use.
Setting the parameter to 0 will disable this per-VP memory pool feature.

NOTE: This setting was introduced into IDS 10 with a minor bug which causes it
to leave the VP to allocate unlimited memory, causing the database engine to
grow it's memory use without stop. If you set this parameter to a specific value,
this problem disappears. More info on this problem is available at: http://www-
03.ibm.com/developerworks/blogs/page/gbowerman?entry=memory_leak_in_ids_
10 From our experience, the 64 bit version of the engine does not have this issue,
however this is not confirmed by IBM, so it's best to set this parameter to some
value.
Recommended Configuration
Start with a value of 20MB, and then monitor the system to make sure the
memory is used properly.
You can set this parameter on an already running engine using the command:

$ onmode -wm VP_MEMORY_CACHE_KB=20000

And make sure you set the parameter in the onconfig file for permanent effect:

VP_MEMORY_CACHE_KB 20000

Monitoring the VP Memory Use


Use the onstat -g vpcache command to monitor how each VP is using it's
memory cache. Let the system function for a few weeks before deciding to alter
the value of the VP_MEMORY_CACHE_KB setting, unless there are obvious
system problems because of it.

$ onstat -g vpcache | grep -E 'vpid|%'


vpid pid Blocks held Hit percentage Free cache
1 12623 4 71.1 % 61.6 %
vpid pid Blocks held Hit percentage Free cache
3 12625 8 61.1 % 53.3 %
vpid pid Blocks held Hit percentage Free cache
4 12626 6 67.0 % 61.7 %

The DS_NONPDQ_QUERY_MEM Parameter (IDS


10)
With IDS 10, the configuration allows for the memory allocated to queries in the
system to be specified using this new parameter.
The maximum value of the parameter is limited to 25% of the
DS_TOTAL_MEMORY setting, which is used in the parallel query (PDQ) feature
of the database. Please review the section in this document relating to Update
Statistics for an overview of PDQ.
To make sure the application queries have sufficient memory for their processing,
and that they avoid using the temp db spaces unless necessary, set this value to
it's maximum using the 25% of DS_TOTAL_MEMORY rule.

References
The IBM online documentation on this feature -
http://publib.boulder.ibm.com/infocenter/idshelp/v111/index.jsp?topic=/com.ibm.p
erf.doc/perf76.htm
The discussion on the minor bug - http://www-
03.ibm.com/developerworks/blogs/page/gbowerman?entry=memory_leak_in_ids_
10
The DS_NONPDQ_QUERY_MEM online documentation -
http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.adr
ef.doc/adref82.htm
The DS_NONPDQ_QUERY_MEM setting discussion -
http://www.webservertalk.com/archive221-2006-4-1458075.html
Tuning Update Statistics & Effects of
Parallel Data Query (PDQ)
When the database engine executes queries from the application, it has to
determine the best way to gather the data that the query is requesting.
In order to determine the fastest and least resource intensive execution for a
query, the database engine keeps information (statistics) about the table and
index data in the database.
This information describes how the data is stored in the tables, and how the
indexes are laid out in regards to being helpful in the execution of queries.
As the data in the tables changes (new rows inserted, rows deleted and updated)
the statistics about the data layout change, so the engine needs to refresh the
information it keeps on the date in all the tables.
The task responsible for this refreshing of statistics is the "UPDATE STATISTICS"
operation. Making sure the statistics are properly kept, will make a major
difference in the way the application functions. Make sure you review this section
carefully, and that you review the References section for in-depth information
about this important aspect of the database engine performance.

Recommended Method For Basic Update Statistics


The update of statistics operation is usually configured at time of installation of the
RRT application. It is setup to run as a nightly cron job, so that the statistics are
updated for the daytime peak operation.
Depending on the team responsible for the operation of the database server, the
cron jobs will vary in syntax and implementation, however, the important part is
the SQL command used to update the statistics.

echo "update statistics low for table drop distributions;" | dbaccess rrtdb
echo "set PDQPRIORITY 0; update statistics for procedure;" | dbaccess rrtdb

(see the following sections on PDQ for the explanation of the PDQPRIORITY
parameter)
If you would like to use other methods for updating the statistics, please contact
RRT to discuss the impact to the application performance and operation.

PDQ, Parameters & Appropriate Usage


The area of Parallel Database Queries (PDQ) requires proper understanding of
the feature as implemented in the Informix database.
Please use the resources listed in the References section to make sure you have
proper understanding of this concept, and it's effects, before changing the
parameters from their default values.
The PDQ feature is mainly useful for decision support operations that involve:

Data mining
In depth reports
Long running database transactions (minutes to hours)
Very few active clients.
The RRT application will in general not benefit from heavy use of PDQ. If you are
going to change configuration parameters related to PDQ and it's directs effects
on the RRT application functionality, please contact us before making the
changes in a production system.
ONCONFIG Parameters
MAX_PDQPRIORITY
Set in the onconfig file, ranges from 0 to 100, representing percent of resources
used for parallel queries.
Having this parameter set to 50 will give you the ability to use the parallel query
feature for administrative tasks, as the PDQPRIORITY parameter is the
controlling value for how each database client session uses the PDQ feature.
If you don't have any need for the PDQ functionality, setting this parameter to a
low value, or even turning it off (set to 0) should be considered.
DS_MAX_QUERIES
This parameter limits the number of simultaneous queries which will be processed
as parallel queries using the PDQ optimization. A query will be counted against
this parameter value if it has it's PDQPRIORITY set to a non-zero value, using the
environment variable PRQPRIORITY, or the session parameter with the same
name.
As the RRT application does not benefit from PDQ optimizations in majority of it's
operation, this parameter should be set to a relatively small value. A good starting
value is 20 and then adjust as needed.
DS_TOTAL_MEMORY
This parameter controls the maximum amount of memory that the database
engine will dedicate to processing queries whose PDQPRIORITY parameter is
set to a non-zero value.
This parameter becomes relevant for the RRT application only in relation to the
DS_NONPDQ_QUERY_MEM value. In order to allow the non-PDQ queries to
benefit from the available memory on your server, set the DS_TOTAL_MEMORY
value to 1/2 of the total memory you allocate to the database with the
SHMVIRTSIZE parameter, and then monitor the database operation to adjust the
value accordingly.
DS_MAX_SCANS
This parameter specifies the maximum number of PDQ scan threads the
database engine will allow. Start with a value of 100 as the parameter is not
relevant to the general operation of the RRT application.
DS_NONPDQ_QUERY_MEM (IDS 10)
See
Informix_Tuning#The_DS_NONPDQ_QUERY_MEM_Parameter_.28IDS_10.29
Example PDQ Configuration
SHMVIRTSIZE 4096000 #4 GB of RAM allocated to the
database
...

MAX_PDQPRIORITY 100 #allow PDQ queries to use all of the


PDQ resources
DS_MAX_QUERIES 20
DS_TOTAL_MEMORY 2048000 #50% of SHMVIRTSIZE = 2 GB of memory
allowed for PDQ operations
DS_MAX_SCANS 100
DS_NONPDQ_QUERY_MEM 512000 #25% of DS_TOTAL_MEMORY = 500 MB

You can change the above settings on a live configuration with the following
commands. NOTE: As with any live changes, it is best to avoid them unless
absolutely necessary.

#determine the memory allocated for the database


$export ids_shm=$(onstat -c | grep SHMVIRTSIZE | awk '{print $2}')
$echo $ids_shm
4096000

$onmode -D 100 #MAX_PDQPRIORITY


100
$onmode -S 100 #DS_MAX_SCANS
100
$onmode -Q 20 #DS_MAX_QUERIES 20
$onmode -M $[$ids_shm/2] #DS_TOTAL_MEMORY
SHMVIRTSIZE/2
$onmode -wm DS_NONPDQ_QUERY_MEM=$[$ids_shm/8] #DS_NONPDQ_QUERY_MEM
DS_TOTAL_MEMORY/4 = (SHMVIRTSIZE/2)/4 = SHMVIRTSIZE/8

The PDQPRIORITY Setting


Ranges from 0 to 100, representing the percent of MAX_PDQPRIORITY that the
current database session will use.
The value of this setting can be specified in 2 ways. One is by setting is as an
environment variable before invoking the client that will connect to the database:

export PDQPRIORITY=50
dbaccess rrtdb test.sql

The other is by setting it explicitly in the session to the database before executing
the SQL command:

echo "set PDQPRIORITY 50; select * from some_table;" | dbaccess rrtdb

The default value is 0, which turns off the PDQ feature. This is not the case for
stored procedures that are compiled with an explicitly defined PDQPRIORITY
setting, for details see the following note.
NOTE: If you use this parameter with UPDATE STATISTICS keep in mind the
following:

Update statistics will not execute in parallel, instead, it will only use the
memory resources specified for PDQ. Before considering this option, review
the DBUPSPACE parameter section later in this document.
Updating of the statistics for stored procedures will records the value of the
PDQPRIORITY parameter with the stored procedure. When executed, the
stored procedure will use the value off PDQPRIORITY from the update
statistics session, and not of the session invoking the stored procedure.
Optimizing The Update Statistics Process
In cases where the update statistics processing takes an extraordinary amount of
time, and production operation is affected, there are a few tuning options to help
improve the duration of the update statistics operation.
The configuration changes are recommendations only, and should be monitored
on the actual hardware in order to make sure they provide the expected
improvements.
To make sure that the statistics processing has enough memory to perform the
sorting and analysis of the table data, we call allow it run as a PDQ process, thus
gaining access to the PDQ memory.
So the basic update statistics would look something like this:

echo "set PDQPRIORITY 100; update statistics low for table drop
distributions;" | dbaccess rrtdb
echo "set PDQPRIORITY 0; update statistics for procedure;" | dbaccess rrtdb

The next change would be to run the update statistics in parallel, so that multiple
tables can be updated at the same time. NOTE: This is only helpful if the system
has enough CPUs dedicated to the database engine to handle the parallel update
statistics.
The rrt_update_statistics.sh Script
With input from administrators from existing installations, we have put together a
script that can automate this parallel update statistics operation. You can retrieve
the script at:

rrt_update_statistics.sh
The basic usage of the script is:

$ rrt_update_statistics.sh
USAGE: rrt_update_statistics.sh <database name> <number of sessions>
<notification email>

The script will basically split the tables in the system into <number of sessions>
groups and execute the update statistics for each group simultaneously.
The value of the <number of sessions> parameter will depend on the number of
CPUs you have dedicated to the database. It is best to keep that number at 1/3rd
of the number of CPUs you have allocated to the database. You can experiment
with larger values, as the benefit of parallel I/O operations may still gain
improvements even on systems with fewer CPU resources.
With the PDQ settings properly set, this approach will provide for a better use of
the system resources when the update of the statistics is executing.
NOTE: Monitor the execution of the above script for the first few days and make
sure that the system is not being overloaded.
DBUPSPACE Parameter
The update statistics operation uses the DBUPSPACE parameter to determine
the amount of disk and memory it will use. The parameter is set as an
envrionment variable for the session that will execute the update statistics
statement.
The parameter value is composed of 2 numbers:

export DBUPSPACE=<disk space KB>:<memory MB>

If this parameter is not set, the default values (as of IDS 10) are:

1000 for the disk space


15 for the memory.
So the update statistics will use 1MB of disk space and 15MB of memory for it's
operation.
The <memory MB> parameter can only go as high as 50. Higher memory can be
made available by setting the PDQPRIORITY parameter for the update statistics
session, however, this can have other side effects so please read the following
sections on PDQ before using this parameter.

References
http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/mille
r/0203miller.html
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.sqls.doc/sql
s887.htm#sii-02upstat-17587
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.docnotes.d
oc/uc6/ids_sqlr_docnotes_10.0.html#wq15
http://www.iiug.org/waiug/archive/iugnew2000Fall/How_to_PDQ.htm
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.perf.doc/pe
rf333.htm#perf121003254
http://docs.rinet.ru/InforSmes/ch13/ch13.htm#Heading5
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.perf.doc/pe
rf331.htm
BTREE Cleaner Optimization
During any database operation, when the table data is altered (deleted or
updated), all indexes created against the modified fields/columns also have to be
updated. In order to keep the wait time for the delete/update operations to a
minimum, database engines usually leave the old index data cleanup for a
separate cleaner module, that runs in parallel with the regular database operation.
This cleaner module in Informix is called the BTREE cleaner. The BTREE cleaner
runs as an internal database session, with a default priority set to 'low'. The low
priority means that it will not take resources away from other sessions in the
database.
In a normal operation, the BTREE cleaner activates every once in a while,
performs a few brief cleanup operations and falls back into idle mode. However,
in high load environments, an improper configuration can cause the cleaner to
significantly impact the database performance.
The primary cause for the performance impact is the fact that the cleaner session
has to load the index pages into memory in order to analyze and process them.
With configurations where the thresholds for the BTREE cleaner are set to low
values, and the database engine is not given much RAM (under 2GB), this
process can cause heavy I/O load and very slow response times.
With low memory resources, the database engine is forced to push out
cached/read-ahead data from memory, so it can make room for the BTREE
cleaner to load index pages, causing the application sessions to work with table
data directly from disk, and with very little in-memory cache.
There are several approaches in dealing with this condition, some of which are
immediate, others require planning and downtime to implement.

Parameters Controlling The BTREE Cleaner


The Informix_Tuning#References_4 section provides links to resources with
detailed information on the BTREE cleaner and it's parameters. This section only
addresses the parameters, and values, relevant to the RRT application operation.
All of the parameters in this section can be configured using the BSCANNER
parameter in the onconfig file (IDS v10 and higher), or by using the commands
shown in the Informix_Tuning#Adjusting_a_Live_Database section later on.
Number of Threads
This setting defines the number of worker thread the BTREE cleaner will use to
scan the indexes and find the ones that need to be cleaned. As this setting
determines the number of CPU VPs the BTREE session will use, setting this
value to 1 makes the most sense to begin with.
If the cleaner is configured to run in off-business hours, then dynamically setting
this value to a higher number might make sense, so that the engine can finish the
index cleanup work sooner. Naturally, this number should not be larger than the
total number of CPU VPs assigned to the database instance.
Priority
This parameter determines the priority with which the BTREE scanner will request
database resources. Set this parameter to "low", since a high priority will take
resources away from application database sessions.
NOTE: For some reason, the database engine automatically increases the priority
of the BTREE scanner over time. If you do not implement the off-hours only
method in section Informix_Tuning#Recommended_Production_Operation, then
make sure you monitor the priority of the BTREE threads using the 'onstat -C all'
command, so that you can address this automatic transition from low to high
priority.
Threshold
This setting is used by the BTREE cleaner to determine if an index should be
considered for cleanup. The value represents the minimum number of
deleted/modified entries an index should have before it will be considered for
cleanup.
The default value of 500 is very inappropriate for production level systems, as it
causes all indexes to be cleaned all the time. A larger value of 50000 is more
appropriate for the type of database usage typical high volume application
require.
Range Size
This setting determines the method the BTREE cleaner will use in order to read
the index pages into memory. The value specifies then number of pages at which
an index is considered to be large. For large indexes the database will not use the
typical LRU based I/O, and instead it will use the light scan buffers and read large
blocks of index pages at a time. The recommended value for this parameter is
10000, however it can be adjusted after sufficient information is gathered on the
behavior of the BTREE cleaner over a significant period of time.
Adjusting a Live Database
The following commands can be used to change a live instance, so it would use
the recommended BTREE scanner configuration:

onmode -C disable
onmode -C stop 2 #(if there are 2 threads running, this is visible with
onstat C all)
onmode -C start 1 #(so there is only one thread from now on)
onmode -C low
onmode -C threshold 50000
onmode -C rangesize 10000
onmode C enable

onstat C all | less #(get status)

onstat g act #(shows active informix threads, btscanner_0 or something


like that used to be there all the time, not any more)

Onconfig Configuration (IDS version 10 and newer


only)
To make the BTREE cleaner configuration permanent and effective each time the
database is started, make sure the following setting is present in the onconfig file
for the engine instance.

BTSCANNER num=1,priority=low,threshold=50000,rangesize=10000

As this option is only available for IDS version 10 and onward, IDS 9.4x
installations will have to use the method described in the
Informix_Tuning#Adjusting_a_Live_Database section, by putting those
commands in the start up script used to boot up the database engine.

Recommended Production Operation


As the database engine can trigger a start of the btree cleaner module at any
time, it is best, if possible, to delay the cleaning operation for non-business hours.
Setting up a cron entry would be a possible approach, so that the BTREE thread
is disabled at start of business hours (6 to 8 am) and then enabled again at end of
the high volume business day (11 pm or later on).
This approach would allow the database engine to re-optimize the index page
usage, while not affecting the high volume operation during business hours.
ALICE Mode - IDS Version 11
Alice (adaptive linear index cleaning) mode, is an available more of operation for
the BTREE cleaner. It optimizes the index selection and page cleaning so that the
cleaner drain on the database resources is minimized.
As we experiment with this option in our IDS 11 test environments, we will provide
further info and recommendations on this feature. The feature was released in an
update to IDS 10 (xC6) as well, however as it is originally a feature of IDS 11, we
leave it to be a part of the IDS 11 certification efforts.

References
http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.adr
ef.doc/adref50.htm
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.adref.doc/adre
f320.htm#sii03a889486
http://www.informix-support.co.uk/btscanner.htm
http://publib.boulder.ibm.com/infocenter/idshelp/v111/index.jsp?topic=/com.ibm.p
erf.doc/perf381.htm
http://www.ibm.com/developerworks/data/library/techarticle/dm-
0810duvuru/index.html

Virtual CPU Control


The Informix database uses the concept of "Virtual Database Processor" (VP) to
control how many parallel operations it will handle at one time.
This functionality allows the database to be configured to use a specific number of
physical CPUs, thus making it possible for an installation to share the CPU
capacity of the hardware with other applications.
You can use the following command to check what the current number of VPs
currently being used:

$ onstat -g sch | grep cpu


1 1710 cpu 433 1471 4537
3 1712 cpu 447 474 9582
4 1713 cpu 223451790 243551530 9358
5 1714 cpu 164518798 183347892 9193
6 1715 cpu 119217161 136478554 8999
7 1716 cpu 87503971 103276370 8788
8 1717 cpu 64599744 78847074 8564
9 1718 cpu 47597853 59987176 8360
10 1719 cpu 34704464 45486970 8124
11 1720 cpu 25401777 34548633 7913
12 1721 cpu 18468596 26250916 7671
13 1722 cpu 13454760 20126516 7410
14 1723 cpu 9890927 15674085 7133
15 1724 cpu 7374562 12429135 6860
21 22234 cpu 3842 10709 5299
1 1710 cpu 576486298 78848908 25505707 10640899 717292997 0
3 1712 cpu 877987480 213842274 159978909 98617953 455654097 0
4 1713 cpu 493487404 88667451 25416255 9967342 0 0
5 1714 cpu 419410303 83758783 23300902 8080038 0 0
6 1715 cpu 355149640 78654280 21488605 6636409 0 1
7 1716 cpu 302929447 74197056 20461350 5535296 0 0
8 1717 cpu 257516927 69410588 19124008 4603361 0 0
9 1718 cpu 216265250 64513462 18204535 3747843 0 0
10 1719 cpu 180199106 59625508 17211777 3017244 0 0
11 1720 cpu 150390851 55211415 16538855 2429914 0 0
12 1721 cpu 125461995 50977873 15950994 1944866 0 0
13 1722 cpu 106006084 47597175 15519168 1578949 0 0
14 1723 cpu 90847364 44586931 15100246 1300593 0 0
15 1724 cpu 79183863 41979071 14707747 1087629 0 0
21 22234 cpu 170098 139528 93681 1528 0 0

Planning The Number of VP CPUs


The simplest rule for the number of VPs assigned to the database is to not
exceed the number of physical CPU cores on the hardware.
The optimal configuration should leave at least one CPU core free, so that the
operating system and system services can function without degradation in
response.
On systems where Hyperthreaded CPU cores are used, the database can be
assigned a VP per Hyperthread. This means that a 4 core CPU with
Hyperthreading technology can theoretically support an Informix instance with 8
VPs. It would be best to not exceed 6, however short periods of high database
demand can be managed with the 2 additional VPs.
As with any resource utilization configuration, it is best to closely monitor your
operation and determine the optimal configuration based on your end results.
Make sure you re-evaluate your Virtual Processor configuration if you migrate to
newer hardware, or if you introduce load sharing technologies like HDR, RSS,
SDS and so on.
Changing The Number Of VP CPUs
To increase the number of VPs on a running instance use the following
command:

onmode -p +1 CPU

To decrease the number of VPs on a running instances use the following


command:

onmode -p -1 CPU

Monitoring The Number of Asynchronous


Input/Output Workers (AIO)
The Informix database utilizes a pool of disk read/write worker processes
responsible for data being read and written to the database chunks.
These workers can become over utilized due to high activity, and need to be
monitored on a periodic basis, especially during peak processing times so that
optimal configuration can be achieved.
The following command displays the usage of the AIO workers in an Informix
installation:

$ onstat -g iov

IBM Informix Dynamic Server Version 10.00.UC9 -- On-Line (Prim) -- Up


123 days 10:56:11 -- 1631196 Kbytes

AIO I/O vps:


class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup
errors
msc 0 i 0.9 9947735 0 0 0 9436366 1.1 0
aio 0 s 253.0 2698311132 2637077672 56068878 0 879325427 3.1
0
aio 1 i 79.2 845108956 788868188 54756088 0 3515001800 0.2
0
pio 0 i 0.8 8907313 0 8907313 0 8907313 1.0 0
lio 0 i 17.9 190505642 0 190505642 0 190503778 1.0
0

You can use the following command to see the AIO statistics for each database
chunk file:

$ onstat -g iof
IBM Informix Dynamic Server Version 10.00.UC9 -- On-Line (Prim) -- Up
123 days 11:05:58 -- 1631196 Kbytes

AIO global files:


gfd pathname totalops dskread dskwrite io/s
3 rootdbs_chunk01 1170700 825169 345531 0.1
4 phydbs1_chunk01 8907925 82 8907843 0.8
5 logdbs1_chunk01 186834490 25224945 161609545 17.5
6 logdbs1_chunk02 33349857 4439865 28909992 3.1
7 dbs1_chunk01 1342712919 1340223745 2489174 125.9
8 dbs1_chunk02 2835998101 2835310991 687110 -136.8
9 dbs1_chunk03 2187052477 2185495926 1556551 -197.6
10 dbs2_chunk01 340585077 338874980 1710097 31.9
11 dbs2_chunk02 739404855 732594244 6810611 69.3
12 dbs2_chunk03 1 1 0 0.0
13 dbs3_chunk01 425297661 391663066 33634595 39.9
14 dbs3_chunk02 502557433 502232068 325365 47.1
15 dbs3_chunk03 492272496 490127316 2145180 46.1
16 dbs4_chunk01 626185932 625553254 632678 58.7
17 dbs4_chunk02 267427760 267181958 245802 25.1
18 dbs4_chunk03 203498962 203414491 84471 19.1
19 dbs5_chunk01 385472108 378606789 6865319 36.1
20 dbs5_chunk02 13919968 8951406 4968562 1.3
21 dbs5_chunk03 13790675 8834790 4955885 1.3
22 tempdbs1_chunk01 33646371 17910060 15736311 3.2
23 tempdbs1_chunk02 3 1 2 0.0
24 dbs5_chunk04 5779205 3544398 2234807 0.5
25 dbs5_chunk05 997252567 995206385 2046182 93.5
26 dbs5_chunk06 1841097314 1832923066 8174248 172.6
27 dbs1_chunk04 2780220554 2770179172 10041382 -142.0
28 dbs5_chunk07 1 1 0 0.0
29 dbs1_chunk05 310881353 306116094 4765259 29.1
30 dbs4_chunk04 43280399 42903943 376456 4.1

Changing The Number Of VP AIO Workers


To increase the number of AIO VPs on a running instance use the following
command:

onmode -p +1 AIO

To decrease the number of AIO VPs on a running instances use the following
command:

onmode -p -1 AIO

Article
Discussion
View source
History
Log in / create account
Navigation
Main Page
Results Home
Support Contact Email
Frequent
Troubleshooting
FAQ
Installation And Maintenance Guides
RRT Releases
Recent changes
Results Internal
Contact Info
Login Info
Build And Package
Search
Go Search

Toolbox
What links here
Related changes
Upload file
Special pages
Printable version
Permanent link

You might also like