SAP Sybase Replication Server Internals and Performance Tuning

RDP364
SAP Sybase Replication Server Internals and

Performance Tuning
Chris Brown & Jeff Tallman, Enterprise Systems Group, SAP
brown.chris@sap.com & jeff.tallman@sap.com
Disclaimer
This presentation outlines our general product direction and should not be relied on in making a
purchase decision. This presentation is not subject to your license agreement or any other agreement
with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to
develop or release any functionality mentioned in this presentation. This presentation and SAP's
strategy and possible future developments are subject to change and may be changed by SAP at any
time for any reason without notice. This document is provided without a warranty of any kind, either
express or implied, including but not limited to, the implied warranties of merchantability, fitness for a
particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this
document, except if such damages were caused by SAP intentionally or grossly negligent.
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 2

SAP SRS Internals & Performance Tuning
SRS Internals & Processing

• RepAgent User
• SQM (Stable Queue Manager)
• SQMR/SQT (SQM Reader & Stable Queue Transactions)
• DIST (Distributor)
• RSI (Replication Server Interface) and RSI User
• DSI (Data Server Interface)
• DSIEXEC (DSI Executor)
• DSIHQ (HVAR and RTL)
Extra Materials (available via email request for slides)
• ASE RepAgent Multi-threading, MPR & Multiple Scanners
• Direct Load (Create subscription)
• Heterogeneous RepAgent Internals & Processing

SAP SRS Internals & Performance Tuning
Unfortunately, we will be rushing things a tad….(tad = more than “a bit”)

Section Duration
Intro 5
Lab 1 15
RepAgent User 10
SQM 10
Lab 2 10
SQMR/SQT 10
DIST 5
RSI 5
Lab 3 10
DSI 10
DSIHQ 10
DSIEXEC 10
Lab 4 10
120
…but 4 hour slots were not allowed for RDP…. 

SRS Internals & Processing
How data flows through the SRS
SRS Threading
SRS uses POSIX threading for SMP
Each thread executes one or more SRS code modules

• Name for the thread is often derived from the principal module thread executes
• E.g. SQT thread really is both SQMR and SQT modules, but since SQT is principal function for thread, thread
is known as SQT thread.
Threads communicate in several ways

• Internal OpenServer message queues (ala JMS or MQ but within OpenServer)
• Share data caches (e.g. SQT cache, SQM page cache, etc.)
• OpenServer callback routines

SRS high level data flow (normal)
Rep Agent User Stable Queue Manager
Rep Agent
(EXEC) (SQM)
(external to RS) 1 2
3
System Table Services (STS) Inbound Queue
& STS Cache SQT Cache (IBQ)
Stable Device
5 4
Distributor Stable Queue Transaction

(SQT)
6 (DIST)
7
RS System DB
Stable Queue Manager
(RSSD)
(SQM)
SQT Cache
Outbound Queue
(OBQ) 10
8 Data Server Interface9 DSI Executor(s)
(DSI/DSI-S) (DSIEXEC)
SRS data flow - normal
1. RepAgent sends replicated changes to SRS using Log Transfer Language

2. RepAgent user parses and normalizes data and then submits to SQM
3. SQM writes data to the inbound queue
4. SQT reads data from the inbound queue
5. SQT sorts the replicated changes into commit order using SQT cache
6. DIST forwards replicated changes to the outbound queue for subscribing sites
7. SQM writes data into the outbound queue
8. DSI reads data from the outbound queue
9. DSI determines how to apply the data (language, bulk, grouping)

• SQT cache is used to resort the transactions in case of multiple sources to same destination
10. DSIEXEC sends the replicated commands to the replicate database

SRS data flow – logical connection (aka warm standby)
Rep Agent User Stable Queue Manager
Rep Agent
(SQM)
(external to RS) 1 (EXEC) 2
3 Inbound Queue
(IBQ)
System Table Services (STS)
& STS Cache
SQT Cache
DIST/SQT Stable Device
4
DIST
SQT Cache
6
WS-DSI
5
RS System DB
SQM DSIEXEC
(RSSD)
SQT Cache
OBQ
DSI DSIEXEC

SRS data flow – logical connection
1. RepAgent sends replicated changes to SRS using Log Transfer Language
2. RepAgent user parses and normalizes data and then submits to SQM
3. SQM writes data to the inbound queue
4. WS-DSI reads data from the inbound queue
5. WS-DSI sorts replicated changes into commit sequence and determines how to apply the
data (language, bulk, grouping)
• SQT cache is used to resort the transactions
6. DSIEXEC sends the replicated commands to the replicate database

SRS data flow - route
EXEC SQM
1 2
3 IBQ RSI User
DIST SQT 9 10
5 4 DIST
6 7 8 11
SQM OBQ RSI OBQ
SQM RSI
SQM 12
SQM
OBQ DSI-S DSIEXEC 15

OBQ 13 DSI-S 14 DSIEXEC

SRS data flow - normal
1. Steps 16 as normal
7. SQM writes data into the outbound queue for the route
• Aka route queue or RSI queue
8. RSI reads data from the route queue
9. RSI sends the data to the next SRS along the route using RTL
• Replication Transfer Language….similar to LTL
10. RSI User thread receives packets and sends to route DIST module
11. Route DIST module separates local from re-routed subscribers and writes commands to
a. local database outbound queues for subscribers in that RS
b. Route queues for routes to other destinations if further routing required
12. For databases local to the RRS, DSI processing as before

• If an IRS, route processing as in steps 711
Some basics (1)
Think of SRS as multiple pipelines with disks at each end of the pipeline
• Inbound (capture) pipeline: (log)  RA  SRS RepAgent User  SQM  (inbound queue)
• Distribution pipeline: (IBQ)  SQMR/SQT  DIST  SQM  (OBQ)
• Route pipeline: (PRS OBQ)  PRS RSI  RRS RSI User  DIST/MD  SQM  (OBQ)
• DSI (apply) pipeline: (OBQ)  DSI  DSIEXEC  RDS.RDB  (log)
• WarmStandby pipeline: (IBQ)  WS/DSI  DSIEXEC  RDS.RDB  (log)
Compare processing rates across the threads/modules
• If RA is sending 1000 cmds/sec, then the SQM, DIST, RSI, DSI and DSIEXECs all need to be able process
1000 cmds/sec or latency will develop
 If they can’t and are running slower, latency will show up as disk space in the primary txn log, inbound queue or outbound
queue
 Slowness in one module will quickly determine the rate for all the modules in same pipeline
– E.g. if DIST can only process 900 cmds/sec, as soon as SQT cache is full, SQT will be forced to slow to 900 cmds/sec as well
• What to look for:
 Where disk space is piling up. The problem is in the next pipeline!!
 Where time is spent in each thread/module

Some basics (2)
Understand the topology you are troubleshooting

• Know the source and target databases….and the SRS’s along the path
• Understand whether using DB repdef/subscr, table repdef/subscr, function strings, etc.
 Some of this will be made more obvious as you look at the counters, but up front know the basics – e.g. it uses table
repdefs/subscriptions using a separate set of repdefs from others due to datatype mapping, etc.
Sanity check the configs – especially memory

• Common issues include:
 SQT cache oversized
 SQM page cache oversized
 Server level configs set for highest requirement vs. adjusting the high volume connection individually.
• Consider the number of connections, relative workload of each and CPU resources available (as well as
memory)

The key to understanding RS MC  “RSTOB”
Rate
• Compare the rates across the pipelines….when and where it slows indicates a bottleneck
• Remember that the rate across a segment of the pipeline will often be driven by the slowest thread
 You may need to look at each thread in that segment to find the root cause
Space
• If the above fails, use disk space. The problem area is in the segment of the pipeline AFTER the backlog
Time
• Each module has a lot of time counters. Use them to find out where time is being spent and focus on the
largest time chunks.
Objects
• Since a common backlog is in the DSIEXEC, look at which objects are involved and what DML (I/U/D) – then
sanity check repdefs and replicate db indexes as well as repdef owner vs. table owner marking
Bell Wringers
• Look for common faults that almost always cause significant backlog such as TransRemoved, RAWriteWaits,
DSIEXEC ResultTime, etc.
Lab #1
Analysis DB setup, running reports, summary report (latency & memory)
Lab instructions: setup
Should already be done before class
Create a new ASE server using 4k page size

• Specify 2GB of memory or more
• Extend tempdb to 1 GB
• Add two devices for data (1GB) and log (250GB+)
Installing the package
1. Cd to package install directory, then cd to the setup subdirectory
2. Edit & run the rep_analysis_db.sql script to create rep_analysis_157 database
3. Run the script rep_analysis_tables_157.sql to create the schema
4. Run the script rs_mc_analysis_procs_157.sql to create the procs
5. Run the script rs_report_headings.sql
6. Run the script rs_report_explanations.sql
7. Cd to rs_datatype
8. Edit and execute the bcp script bcp_in_rs_datatype.bat

Lab 1: RepAgent User & SQM (i)
Load the data to be analyzed:

1. Cd to lab subdirectory
2. Run the truncate.sql script to clear any residual data
3. Edit and run the bcp_in_rssd.bat script
 Take note of how the version of SRS is handled (rs_databases, rs_objects, etc.)
4. Run the update_stats.sql script
Run the summary report

• Use the syntax “exec rs_perf_summary”
• Save output to a file (e.g. use isql –o)
• Edit with (editor)
Run the detailed report
• Use the syntax “exec rs_perf_details ‘PDS.PDB’, ‘RDS.RDB’”
• Same output to a file
• Edit with (editor)
• Note the interval between samples
Lab 1: Questions
Summary Report:
1. How is data distributed (table repdefs/subscriptions, WS, db repdefs/subscr, routes)?
 If you have time, draw a quick sketch
2. Which connections are showing latency/backlog?
 Where do you suspect the problem(s) is/are?
 Which ones are more key?
3. How is memory allocated among the threads/caches?
 Is there a real potential for over-allocation or is it unlikely?
Detailed Report:
4. Look at the performance summary….how do the module throughput rates compare?
5. Where is there latency? At a high level, why?

RepAgent User Thread
The first stage in inbound processing
Stepping back through it a bit more slowly
RepAgent User does 4 major functions

• EXEC – Executor – RepAgent executes commands from LTL stream
 Get Truncation Requests
 Distribute commands (this is the actual replicated data)
• PRS – Parser – parses LTL into table/columns/values
• NRM – Normalizer
 sorts column values into column order based on repdefs
 eliminates unnecessary data for minimal column replication
• PAK – Packer
 Packs structure into format for writing to queue (slight compression)
 Submits write request to the SQM via the exec_write_request_queue
In a typical SRS, this is a single thread

• More advanced features enabled by ASO option allow multiple threads to be used

Basic RepAgent User thread processing
STS Cache
Normalization
2 (cmd structure)
RSSD
1 Parsing (cmds) 3 packing
Write request to
4 exec_sqm_write_request queue
RepAgent User
SQM

Some performance considerations
The ACK of packets from RepAgent waits until RepAgent User is finished
• Depending on the type of error, SRS can request RepAgent resend or rescan or stop.
• Once write request is put on the write request queue, RepAgent user sends ACK
• However, this waiting contributes to RepAgent latency (more on this later)
Starting tuning considerations for basic operation

• Define repdef columns in same order as table definition (reduces normalization effort)
• sts_full_cache_rs_objects
 Default is off due to backward compatibility with older SRS versions
 If 64-bit SRS and memory available, suggest turning this on – but ONLY if not using repdefs
– Otherwise, just set sts_cachesize to 10000
– Enabling it with repdefs could significantly increase recovery time due to current issue in RS 15.7.1 (and previous) design
 During normalization, RepAgent User checks to see if the repdef is in cache
– If not, it is a cache miss and SRS STS does query to RSSD to see if repdef exist (10’s of milliseconds)
• exec_sqm_write_request_limit
 Default now is 1MB (vs. legacy 16KB)
– If upgrading from previous releases, seriously consider raising this value to 4-8MB at a minimum, 16MB+ for heavy connections
 This is a buffer where RepAgent User can temporarily store write requests if the SQM is busy writing (or deleting) blocks to disk.
– Once this buffer is full, further write requests are stalled (aka RAWriteWaits) which means RepAgent User thread processing suspends –
impacting RepAgent latency

RepAgent latency: 3 primary causes
ASE process kernel and deferred asynchronous events

• ASE process kernel deliberately delays RA processing of ACKs if engine is busy
 This is due to responses to outbound connections (CIS, RA, etc.) use deferred async events
• This is frequently the cause if RepAgent sp_sysmon output shows high percentage of “I/O Wait on RS” as
compared to sp_sysmon total and RS processing time is minimal
• Often an underlying cause is RepAgents start on low engine numbers due to no one on system at time of startup
• Resolution is often to bind RepAgent to last engine online or other low utilized engine
The ACK of packets from RepAgent waits until RepAgent User is finished
• Depending on the type of error, SRS can request RepAgent resend or rescan or stop.
• Once write request is put on the write request queue, RepAgent user sends ACK
• However, this waiting contributes to RepAgent latency (more on this later)
• Resolution is to use Async Parsing (ASO option required)
SQM write waits due to slow inbound queue

• This is the easiest to spot
• However, while there may be some write waits, this is only a factor when the volume of write waits is significant

RepAgent User NRM thread
Pipeline scalability
• Normally, when people think of scalability, the first thought is to parallel processing
 Which in a sense is inherent is SRS via multiple connections just as in a DBMS
• However in data movement, an important aspect is “pipelining”
 In pipelining, multiple threads – each performing a small task are used
 As soon as each thread does its task, it pushes the job down the pipe to the next thread
 This technique is especially prevalent in Complex Event Processing/SAP Event Stream Processor
• Advantage of pipelining over parallelism is that serialization is maintained
 Theoretically, if you used parallel processes you could achieve same throughput
 In reality, much less as
– Some processes would be waiting for data to arrive to process
– Parallel processes would have to synchronize data flow to maintain serialization/transaction order
NRM (Normalization) thread added as a form of pipeline scalability

• Offloads the normalization, packing and write request from RepAgent User
• Goal was to speed up the ack return to RepAgent and reduce latency
 ACK can now be sent as soon as RepAgent finishes determining if it is a command to be distributed or truncation
request, etc. (EXEC stage) and parsing the row values (PRS stage)
• Requires SRS 15.5+ with Advanced Services Option

RepAgent User with NRM thread (SRS 15.5+ w/ ASO)
3
Normalization
2 4
1 packing
nrm_write_request queue
Parsing (cmds) (exec_nrm_request_limit)
LTL batch
5
exec_sqm_write_request queue
NRM (exec_sqm_request_limit)
RepAgent User
SQM

Request limits and write waits
We now have two intra-thread request buffers

• exec_nrm_request_limit (default 8MB)
• exec_sqm_write_request_limit (default 1MB, recommend 8MB)
Sizing considerations
• Both these should be about 8MB. Really large values are ineffective
 Values above 8MB are likely suspect
• If the SQM is lagging that bad, it suggests the IO subsystem is incapable of keeping up – and large buffers
are just trying to mask the problem
 Which they will until the buffer fills….which might take a few minutes….
Monitoring
• RAWaitNRMTime (58038) – exec_nrm_request_limit reached
 Counter obs value is number of times this happened
 Counter total value is total time during the sample period spent waiting
• RAWriteWaitsTime (58019) – exec_sqm_write_request_limit reached

Asynchronous Parsing
The wide table problem

• SRS engineering noticed that most of the time in RepAgent User thread was spent parsing the LTL rows
• The issue turned out to be related to wide tables
 the wider the table, the longer the parsing as would be expected
 This often took 2x longer than normalization
– mostly due to fact that DBA’s had long gotten used to repdef column ordering the same as table ordering which minimized the amount of
work normalization had to perform
SRS 15.7+ adds “Asynchronous Parsing” with ASO option

• A mix of both parallel and pipeline scalability
• Termed “Asynchronous Parsing” due to fact
 Parallel parsing is optional – you can specify just 1 async parsing thread….or more as desired
 Parsing now happens after ACK is returned to RepAgent – so entire RepAgent User thread processing is asynchronous
• Why both?
 If just pipelining – i.e. have each stage parse the next 5 columns – there would be no benefit to smaller table sizes
 In addition, with multi-threaded RepAgents, it was felt that RA might be ready to send more packets before processing
was complete

RepAgent User…..evolution….Parallel PRS in 15.7.1
Available LTL Buffers

Normalization (cmd structure)
Parsing (cmds)
packing
Batch SYNC
5
LTL batch
RepAgent User
1
(EXEC) 6
2 PRS 3 NRM
4
nrm_write_request queue
(exec_nrm_request_limit) SQM

Enabling Asynchronous Parsing (1)
Configuration Parameters
• exec_prs_num_threads
 Specifies the number of async parser threads for the connection.
 Default is 0. Max is 20.
• ascii_pack_ibq
 Specifies whether the messages written to the IBQ of a connection are ASCII Packed.
 Default is “off” (so Binary Pack by default).
• async_parser
 An “alias” configuration (like parallel_dsi) that when set to “on” simultaneously sets configurations
– exec_prs_num_threads to 2
– ascii_pack_ibq to “on”
– cmd_direct_replicate to “on”
– dist_cmd_direct_replicate to “on”
 When set to “off”, all of those configurations are set to their defaults.
• exec_nrm_request_limit
 Controls both the buffer between the PRS and NRM threads as well as the LTL batch buffer sizes
 However, the total memory consumption for SRS is
– exec_prs_num_threads * exec_nrm_request_limit  NRM thread buffering

Enabling Asynchronous Parsing (2)
Sequence to Enable
• Stop the RepAgent via sp_stop_rep_agent
 Not totally necessary as configuring will cause it to disconnect
• Suspend the distributor (suspend distributor in SRS)
• Configure async parsing
• Resume the distributor
• Restart the RepAgent
Limitations and comments
• Currently not compatible with exec command cache
• Currently not compatible with non-passthru RepAgents
 As a result, this works only with ASE sources – RAX does not connect via Passthru (not in JDBC)
• Due to IO processing and reparsing being large bottlenecks, no benefit will be noticed if SQM command
cache is not used
 Make sure cmd_direct_replicate and dist_cmd_direct_replicate are enabled
 Watch SQM command cache sizes
 We will discuss these concepts more in SQM section

Yet Another Parsing Solution: EXEC Cmd Cache
Despite size of schema, most transactions affect <<10% of the tables

• Generally 10-12 primary volatile schema tables – even in SAP’s ERP with 30K tables
• Large bulk operations affect a large number of rows from same table (archiving, etc.)
• The RepAgent and RepAgent User threads were constant packaging into LTL and reparsing the same tables over
and over again.
Solution: EXEC command cache

• Requires ASE 15.7 ESD #1 or higher with RS 15.7.1 – not available in earlier releases
• Works similar to SQL fully prepared statements
 RepAgent has a cache of frequently used commands it maintains as well as “handles”
 RepAgent tells RS to cache a newly seen command in the exec command cache
– Sends a tokenized form just like a fully prepared statement in SQL API’s
 RS caches the command with parse offsets/datatypes for columns
 RepAgent simply tells RS “This is command 5 – and sends LTL of values vs. full LTL format”
 RepAgent User in RS parses as per command 5 in command cache
• Enabled in RepAgent
 sp_config_rep_agent 'ltl metadata reduction', {'true' | 'false'}
– default = false; to enable, set to true
 SRS exec_max_cache_size (default 1MB) controls size of cache
– Can be set server level (configure replication server) or per connection (alter connection) as usual

RepAgent/RepAgent User Feature Map
Feature ASE RAX* RS ASO Comments

Req’d
RepAgent kernel threading 15.7+ N/A
Multi-threaded RepAgent 15.7+ 12.6+ 15.5+  Separate scanner/sender
Parallel LTL Formatters (non-MPR) No Yes Any
RepAgent Multi-Path Replication 15.7+ Yes 15.7+ 
Multiple RepAgents with MPR 15.7 sp100+ Yes 15.7.1+  Need to create rep filters in ASE
SRS EXEC Cmd Cache 15.7+ 15.7.1 15.7.1 Enable in ASE
SRS NRM Thread Any Yes 15.5+ 
Async Parser Any No 15.7.1 
*RAX  Heterogeneous replication agents (e.g. RAO).

Performance Monitoring: Throughput
counter_id display_name description

58000 CmdsRecv Commands received by a Rep Agent thread.
Applied commands written into an inbound queue by a Rep Agent thread. Applied Commands are applied as the maintenance
58001 CmdsApplied
user.
Request commands written into an inbound queue by a Rep Agent thread. Request Commands are applied as the executing
58002 CmdsRequest
request user.
58003 CmdsSystem Repserver system commands written into an inbound queue by a Rep Agent thread.
'mini-abort' commands (in ASE, SAVEXACT records) processed by a Rep Agent thread. Mini-abort instructs Repserver to rollback
58004 CmdsMiniAbort
commands to a specific OQIQ value.
58011 BytesReceived Bytes received by a Rep Agent thread. This size includes the TDS header size when in 'passthru' mode.
Number of command buffers received by a RepAgent thread. Buffers are broken into packets when in 'passthru' mode, or
58013 BuffersReceived language 'chunks' when not in 'passthru' mode. See counter 'PacketsReceived' for these numbers. Authors Note: In later revs of
SRS, PacketsReceived was deprecated and is replaced by RepAgentRecvTime counter_obs value.
Number of empty packets received in 'passthru' mode by a Rep Agent thread. These are 'forced' EOM's. See counter
58014 EmptyPackets
'PacketsReceived' for these numbers.
58022 RSTicket rs_ticket markers processed by a Rep Agent's executor thread.
The amount of time, in milli-seconds, spent receiving network packets or language commands. Authors Note: In later revs of SRS,
58023 RepAgentRecvTime PacketsReceived was deprecated and is replaced by RepAgentRecvTime counter_obs value….so this is the focus in the
discussion on throughput
58037 TotalBytesReceived Accumulated total bytes received by a Rep Agent thread so far.

Comments on throughput counters
CmdsRecv vs. BytesReceived

• Both should be normalized to Cmds/sec & Mbit/sec for usefulness
 Cmds/sec is a good overall throughput metric as it can be compared better across the different threads modules in SRS
processing
– So, if receiving 1000 cmds/sec, all the rest of SRS threads also need to process 1000 cmds/sec to avoid latency
– Due to differences in packing, command structures, etc. Bytes can not be compared across threads/modules
 Mbit/sec is useful for judging network utilization of inbound data
– Would have to be summed() across all connections for full inbound network utilization
– Not as critical anymore with 10GbE….but still may be a factor in 1GbE
– Full picture would need to add in DSI and RSI network utilization…and a guess at RSSD interaction
• Most of the time, the CmdsApplied will be nearly same as CmdsRecv
 Unless SQLDML replication enabled (and even then the individual rows are sent), most of the commands replicated will
be sent as applied vs. request
Some useful derived statistics

• Bytes/Cmd  how big each replicated command is on average in LTL
 This can be skewed low due to begin/commit pairs – especially due to atomic transactions
• Cmds/Packet  how many commands fit in a LTL packet (ASE RepAgent)
• Cmds/Buffer  how many commands fit in a LTL buffer (heterogeneous RA)
Performance Monitoring: Cmd Breakdown (1)

58000 CmdsRecv Commands received by a Rep Agent thread.
Applied commands written into an inbound queue by a Rep Agent thread. Applied Commands are applied as the maintenance user.
58001 CmdsApplied Authors Note: This should be most commands, but may be much smaller than CmdsRecv as begin tran/commit tran are not considered
applied nor request commands
Request commands written into an inbound queue by a Rep Agent thread. Request Commands are applied as the executing request
58002 CmdsRequest
user. Authors Note: This will be for stored procs replicated as request functions only
Repserver system commands written into an inbound queue by a Rep Agent thread. Authors Note: rare – likely only seen during
58003 CmdsSystem
setup/changes to rep subscriptions, etc.
'mini-abort' commands (in ASE, SAVEXACT records) processed by a Rep Agent thread. Mini-abort instructs Repserver to rollback
58004 CmdsMiniAbort commands to a specific OQIQ value. Authors Note: This should only happen when a user rollsback and the transaction is larger than the
user log cache size. A lot of these can really slow down SRS – but also slow down ASE.
'dump database log' (in ASE, SYNCDPDB records) and 'load database log' (in ASE, SYNCLDDB records) processed by a Rep Agent
58005 CmdsDumpLoadDB
thread. Authors Note: This likely will be seen during materialization/re-sync times or when using replicated coordinated dumps
CHECKPOINT records processed by a Rep Agent thread. CHECKPOINT instructs Repserver to purge to a specific OQIQ value. Authors
58006 CmdsPurgeOpen Note: This refers to normal checkpoints, so you should see 1 every 60 seconds or so from ASE. Only when RA first connects are the
open transactions actually purged.
Create, drop, and alter route requests written into an inbound queue by a Rep Agent thread. Route requests are issued by RS user.
58007 CmdsRouteRCL
Authors Note: This helps prevent data loss/duplicate commands during a topology change when an IRS is swapped
Enable replication markers written into an inbound queue by a Rep Agent thread. Enable marker is sent by executing the rs_marker
58008 CmdsEnRepMarker
stored procedure at the active DB. Authors Note: This should primarily be seen only during subscription creation, dropping, etc.
Updates to RSSD..rs_locater where type = 'e' executed by a Rep Agent thread. Authors Note: This will happen with each gettrunc()
58009 UpdsRslocater request – which is driven by RA scan_batch_size, etc.

Performance Monitoring: Cmd Breakdown (2)

RepServer SQLDDL commands written into an inbound queue by a Rep Agent thread.
58021 CmdsSQLDDL
Authors Note: This refers to DDL replication and should only happen with sp_reptostandby or in a standby setup in which DDL
replication is desired.
rs_ticket markers processed by a Rep Agent's executor thread.
58022 RSTicket
Authors Note: Useful only to determine if rs_ticket is being sent too frequently. For most purposes, one ticket every 5 minutes is fine. If
using HVAR or RTL, a rs_ticket could cause premature flushing of the CDB, hence the need to watch this.
SQLDML update commands received by a RepAgent thread.

58027 SQLDMLUpd
Authors Note: This refers to the number of update SQL language commands replicated as both row images and SQLDML. If number
seems low, you will need to look at source ASE MDA tables to see why SQLDML replication was not used.
SQLDML delete commands received by a RepAgent thread.

58028 SQLDMLDel
Authors Note: This refers to the number of delete SQL language commands replicated as both row images and SQLDML. If number
seems low, you will need to look at source ASE MDA tables to see why SQLDML replication was not used.
SQLDML select into commands received by a RepAgent thread.

58029 SQLDMLSelInto
Authors Note: This refers to the number of select/into SQL language commands replicated as SQLDML. If number seems low, you will
need to look at source ASE MDA tables to see why SQLDML replication was not used.
SQLDML insert select commands received by a RepAgent thread.

58030 SQLDMLInsSel
Authors Note: This refers to the number of insert/selectSQL language commands replicated as both row images and SQLDML. If
number seems low, you will need to look at source ASE MDA tables to see why SQLDML replication was not used.

Some config considerations
Cmds/Packet (ASE RepAgent)

• Ideally you want about 5-6 cmds/packet. If only 2-3, with a lot of atomic transactions, you really are only
getting about 1 useful command per packet as the others are likely the begin/commit
• To improve, increase the RepAgent’s packet size
Cmds/Buffer  (heterogeneous RA)

• An LTL buffer is typically 40KB to 64KB
• If you assume ~1KB per replicated command, ideally you want 40-50 cmds/buffer
• If you see a lot fewer, you may have wide rows
 You can try to increase buffer size, but the biggest benefit will likely by using parallel LTL formatters
UpdsRslocater/min
• If too frequent, processing is slowed by constant OQID updates to RSSD (10’s of ms each)
• Since this controls how fast recovery, even 1 per minute would support <1 minute recovery
• Consequently, you only want to see ~<10 per minute (which is 6 second recovery)
• To change, increase the scan batch size up to 10K or 20K max
 If already at that level, do not increase further

Performance Monitoring: Processing & wait times

The amount of time the RepAgent spent yielding the processor while handling LTL commands each time the processor was
58016 RAYieldTime yielded. High values here likely indicates that the SRS was upgraded from a previous version and the config
exec_cmds_per_timeslice is still at a low value (vs. new default of ~2B)
The amount of time the RepAgent spent waiting for the SQM Writer thread to drain the number of outstanding write
58019 RAWriteWaitsTime
requests to get the number of outstanding bytes to be written under the threshold.
58023 RepAgentRecvTime The amount of time, in milli-seconds, spent receiving network packets or language commands.
58025 RepAgentExecTime The amount of time, in milli-seconds, Repagent User thread is scheduled by OCS.
58031 RepAgentParseTime The amount of time, in milli-seconds, spent parsing commands.
58033 RepAgentNrmTime The amount of time, in milli-seconds, spent normalizing commands.
58035 RepAgentPackTime The amount of time, in milli-seconds, spent packing commands.
The amount of time the RepAgent spent waiting for NRM thread to drain the number of commands on the message queue
58038 RAWaitNRMTime
to get the number of outstanding bytes to be written under the threshold.
58040 RACmdsDirectRepSend Number of commands directly sent from Executor with pre-packing data format.
58041 RepAgentExecutorTime The amount of time, in milli-seconds, spent in function _ex_executor_cmd().
58043 RAControlMem Number of time the memory control is executed from Executor.
58044 RAWaitMemTime The amount of time the RepAgent thread spent waiting memory usage under the memory control threshold.

Interpreting the times….
RepAgentExecTime
• RepAgentExecTime is a good indication of how much CPU time the RepAgent is getting.
 This is sort of a sum of all the CPU time for the RepAgent User thread
– ….or at least until you start splitting out the NRM, async parser, etc.
• Compare to sample interval and look at CPU usage on host
 If there are more CPU resources and this number is NOT the full sample period and you have RepAgent latency but no
RA______Waits, then likely cause is ASE process kernel
Packet Recv  EXEC  PRS  NRM  PAK

• The corresponding times to each stage of RepAgent User processing are in the appropriate counters
• A useful way to see if Async Parsers are necessary or whether NRM by itself will help
 May help drive the business case to upgrade to 15.7.1+ sooner vs. later

The (not) good, bad and the ugly….waits
RAYieldTime
• RAYieldTime is tied to the legacy exec_cmds_per_timeslice
 In older pre-SMP versions, this was used to try to govern the RepAgent User from consuming too much CPU time
 Unfortunately, with a common cause of problems being RepAgent latency, it was counter-productive in most cases
• If you see this, you likely upgraded from an older release – just set to 2B and forget it
 Later versions default this to 2B
RAWriteWaitsTime & RAWaitNRMTime

• Discussed before – these are related to exec_sqm_write_request_limit and exec_nrm_request_limit
respectively
• As discussed, if configured for ~8MB, any waits here are likely due to inbound queue issues that need to be
addressed
RAControlMem and RAWaitMemTime

• Either SRS is not configured for enough memory or there are some misconfigurations that are using too much
memory (e.g. SQT cache too large, SQM page cache too large, etc.)
• This is really bad….the ugly….deal with it immediately

RepAgent User Summary
If you have RepAgent latency….

• Check to see how long RepAgent User thread is executing
 If time is a good portion of sample period, you may need to use ASO option features such as NRM thread or Async
Parser
 If time is a small portion, problem is one of or combination of:
– network speed
– ASE process kernel
– SQM inbound queue speed (you will see RAWriteWaitsTime)
• If using ASE 15.7 ESD 1+, an alternative to Async Parser that might help is the EXEC command cache
Main tuning considerations
• Primary consideration is to increase pipelining, decrease ACK delay
• If upgrading from older releases, change your defaults to match current defaults at a minimum
 exec_cmds_per_timeslice (2B), exec_sqm_write_request_limit (8MB)
• Once set, the main caches for write requests and command caches do not need much adjusting
 If there are issues, it likely is a problem that increasing the cache will just mask vs. resolving
 For example, if the EXEC command cache of 1MB is not big enough, it likely is a problem that requires more parsing
resources via multiple async parsers vs. increasing the cache

RepAgent User starting configuration in SRS
Config Default Recommended Rec w/ ASO Rec w/ ASO +

Async Parse
async_parser off (n/a) off on
ascii_pack_ibq off (n/a) off on
cmd_direct_replicate off on on on
exec_cmds_per_timeslice 2147483647 2147483647 2147483647 2147483647
exec_max_cache_size 1048576 1048576 1048576 1048576
exec_nrm_request_limit (n/a) (n/a) 8388608 8388608
exec_prs_num_threads (n/a) (n/a) (n/a) 2
exec_sqm_write_request_limit 1048576 8388608 8388608 8388608
nrm_thread off (n/a) on on

Inbound SQM Processing
The second stage in inbound processing
SQM processing….in the not-so-good old days
Pre-SRS 15.2 SQM processing

• SQM only had a single block in cache
• SQM would read from exec_sqm_write_request queue and append to the block
• When the block was full, it would attempt to flush to disk
• If a timer expired before block was full, it would also attempt to flush to disk
• Once block was flushed, SQMR/SQT had to do physical read to get the block
How the timer works…the theory….

• The timer min is init_sqm_write_delay and max is init_sqm_write_max_delay
 init_sqm_write_delay (default 2000ms or 2 seconds)
 init_sqm_write_max_delay (default 10000ms or 10 seconds)
• The timer starts at init_sqm_write_delay
• If the timer expires and no new data has been added (block is not full), block is flushed to disk and the timer is
doubled for the next block….
• ….this continues until the timer reaches ~2x init_sqm_write_max_delay
• If a block is flushed due to being full before the timer starts, the timer is reset
• The theory is
 During idle periods, SRS will gradually wait longer before writing to disk – saving disk space
 As soon as there is considerable activity, SRS will revert to a shorter wait cycle

SQM Write Delay & SQT Read Delay
SQM will wait until block is full before writing
Init_sqm_write_delay & init_sqm_write_delay_max
Once block is written, buffer is reused for next one

Setting delay too low can impact SQT cached read (pre-15.2 page cache)
Setting too high can impact latency in low volume cases
~70 blocks were written to disk due to timer

expiration vs. full blocks (total was ~450, so
this is about 15%)…needs to be measured
during peak periods and if 30%+, it might
indicate a need to raise these values

Configurable Block Sizes (ASO option required)
Prior to RS 15.5, this was a static 16K
RS 15.2 work-around  multi-block ‘pages’
• Can be set for each queue individually
 alter queue, queue_number, queue_type, set sqm_page_size to ‘nnn’
– nnn is range 1  64; default is 4 when enabled
• Defines the number of 16k blocks per page, so a setting of 4 instructs RepServer to write and read the queue in 64k
pages
RS 15.5 Supports large block I/O

• Configure replication server set block_size to ‘###’ with shutdown
 Valid values are 16K, 32K, 64K, 128K & 256K
• Still a single block write & cache without ‘pages’
A segment is still 64 blocks…..

• So….a segment can be 1MB, 2MB, 4MB, 8MB or 16MB
• Partitions are still in created on even 1MB
 Be careful creating - use a 16MB boundary.
• Admin who,sqm: Last Seg.Block - Next.Read
 Gotta know the block size to figure out how much space is in the queue or how far behind it is.
 Also impacts calculations based on RS MC counters such as SQMR.SQMRBacklogSeg and SQM.SegmentsAllocated, etc.

Block Sizes: Notes & Warnings
Warning!!!
• ONLY use the configure replication server command
 do not update RSSD directly
• Make sure the queues are empty and RS is quiescent
 Suspend log transfer from all
 Wait for DSI and RSI to drain - and save interval to expire
 Admin quiesce_force_rsi
 (optional) sysadmin sqm_purge_queue
 Configure replication server set block_size to ‘256’ with shutdown
 (restart RS)
 Resume log transfer from all
• Change happens on reboot of RS - so it may take a while to come up
OS Changes!!!
• You may need to change OS kernel settings to allow larger IO’s
 set vxio:vol_maxio=16384  max of 16K IO before IO is broken up
 set maxphys=8388608  largest IO
 Check sd/ssd.conf (Solaris)
When to use larger block sizes
When to use larger block sizes

• High volume throughput
• Wide column tables
• When ever using HVAR or RTL (need for this is almost always due to high volume)
Benefits
• Reduces number of I/Os to OS and increases I/O efficiency
• Reduces segment fragmentation when multiple queues use same devices
• Reduces updates to RSSD
Reducing updates to RSSD
• Every segment allocation, the segment allocation has to be recorded to RSSD
 Consider a 1K command size and 2000 cmds/sec
– We would get 16 cmds per block….or 1024 per segment
– We would be allocating 2 segments per second
• For recovery purposes, we update the read position based on OQIDs
 We can control this a little bit with increasing sqm_recover_seg from 1 to 10 – but this is sort of a band-aid and doesn’t
help the allocation speed issue
• Using a larger block size resolves both of these
Problem: SQT/DSI often forced to do physical reads
Pre-SRS 15.1/15.2 there was only a single block in cache
SQT/DSI/RSI often forced to do physical reads

• If it lagged even 1 block or had to wait even a little bit, the block it wanted to read would no longer be in cache
 There were many times when this happened – we will discuss some shortly
• Also, if the current block was filled, SQM would flush to disk starting a new one, so SQT would be forced to
read from disk to get the last rows.
Introducing the “SQM Page Cache”

• The SQM now has a cache to hold previous blocks in memory in a MRU type chain
• SQT/DSI/RSI will now only have to read from disk if the block being read is older than those cached
 This may happen for large transactions that are removed from SQT cache
• The introduction of page sized I/O’s vs. block size was made to try to improve device write speeds as most
HDD’s like larger/sequential type I/Os vs. smaller/random
 The page sizes helps reduce the number of I/Os issued as well – reducing system time

SQM Page Cache

SRE (cmd structure)
Parsing (cmds) packing

Packed binary cmd
SQT Cache (packed cmds)
Packed binary cmd DIST
RepAgent User
LTL ASCII Stream (packets)
SQT
SQM page cache
Current block
SQM
Packed binary cmd

Configuring SQM page cache
A SQM “Page size” is defined as 1  64 blocks

• The page size defines the unit of stable device I/O
Server level default configurations
• Block_size  16 (default), 32, 64, 128, or 256
 Values other than 16 require ASO option
 This is the only aspect of this feature that requires ASO option – otherwise, the SQM page cache is a fully available
feature in all versions of SRS
• sqm_cache_enable  on or off (default is on)
• sqm_page_size  number of blocks in a page (default is 4)
 This is now the IO size for SRS – if large block sizes, you may have to tune OS
• sqm_cache_size number of pages (default is 16)
Connection level configs only by alter queue
• Can be set for each queue individually
 alter queue, queue_number, queue_type, set sqm_cache_enable to { 'on' | 'off' }
– Default is ‘off’
 alter queue, queue_number, queue_type, set sqm_cache_size to ‘nnn’
– nnn is range 1  512; default is 16

Configuring SQM page cache: some considerations (1)
Be careful when configuring

• Remember, every connection has 2 queues
 Using the defaults, that means every connection uses 2 * 16KB/blk * 4 blks/pg * 16 pg = 2MB
 If 256KB blocks, we then have 2 * 256KB/blk * 4 blks/pg * 16 pg = 32MB (!!!) per connection
• If SRS is hosting a lot of connections, consider using alter queue instead of increasing using the server
configs
• Generally
 If striving for very high volume and you have a lot of memory, you can support ~4GB easily enough
• If using older 32-bit releases (e.g. 15.2 and previous), upgrade…upgrade….upgrade
 SQM page cache is in 15.1/15.2….but will quickly exhaust memory if configured wrong
Generally speaking, SQM page cache is to avoid physical reads

• In other words, it buffers data when SQT/DIST lag slightly during surges
• If you are constantly using a lot of page cache, then you are essentially using it to extend SQT cache…..
• For most connections, this means 8MB to 128MB is more than enough
 Actually for low volume connections, the defaults are more than enough
• SQM Command Cache builds on SQM Page Cache
Configuring SQM page cache: some considerations (2)
Biggest benefit will be on SQMR Reads in high throughput

• Presumes SQT and DIST are tuned…otherwise cache just delays inevitable
• Solve SQT cache size problems first
 dist_sqt_cache_size & dsi_sqt_cache_size
Biggest write impact will be on slow devices w/ small block_sizes

• On faster devices (e.g. RAID 0, 10), the performance of the device will likely mask any improvements in either
page size or write caching
• Consider it when using…
 RAID 5 or 6
 SRDF over an appreciable distance such that write performance is slowed
 Large transactions frequently removed from SQT cache
Notes
• Not supported on HPUX (PA-RISC or HPIA)

Monitoring SQM Page Cache
Queue readers track when physical reads were used

SQMR counters BlocksRead vs. BlocksReadCached
Goal is to keep blocks read cached high

Doesn’t have to be 100% - even 70% is good.
Nearly impossible for DSI due to replicate DBMS speed
Counter ID Display Name Description
62002 BlocksRead Number of blocks read from a stable queue by an SQM Reader thread.
62004 BlocksReadCached Number of blocks from cache read by an SQM Reader thread.

What You Don’t Want to See
*CachedPct=(BlocksReadCached/BlocksRead)*100%
SQT cache looks fine, but really, the
SQT is seriously lagging inbound queue problem is being masked by low
• Almost no SQMR cached reads - all physical reads
throughput due to physical reads
No transactions were removed (SQT.TransRemoved)
• So problem is SQM write cache or SQM/SQMR delay tuning
Analysis: SQT (SQMR) Read speed is limiting throughput rate

• With no SQM page cache, as soon as it gets behind the current block, it has to do Physical Reads - at which point without a lull
in activity, it will never be able to catch back up….so it flat lines at 0% cache hits until the next lull in activity.

Adding SQM Write Cache & Pages
Using a 256KB block size, after adding…

4 blocks/page (sqm_page_size)
16 pages for sqm_cache_size
Total SQM Write Cache = 16 pages * 4 blocks/page * 256KB/block = 16MB
Analysis: SQT (SQMR) Read speed is fine up until some limit at which point degradation happens
This limit turns out to be DIST throughput as we will see later
*CachedPct=(BlocksReadCached/BlocksRead)*100%

The need for the SQM Command Cache
One of the most expensive processes is parsing

• Even with built-in string tokenizers (e.g. strtok), the string must be serially scanned
 This gets more fun with multi-byte character sets
In order to assure persistence, SRS writes to stable queues

• Commands get packed into record formats that will need to be reparsed to be used
There are four points of parsing/reparsing in SRS
• RepAgent User (unavoidable due to LTL)
• DIST – needs to parse to determine subscription values, etc.
• DSI – needs to parse to form SQL
• RSI User (in RRS in a route) – needs to parse the RTL – unavoidable as with LTL
Note that the SQT does NOT need to parse the commands
• In only has to know which transaction the command belongs to (part of the command header) and the
table/operation involved (for transaction profiling for bulk operations)

A Picture is Worth 1,000 Words: Parse & Reparse cmds

SRE (cmd structure)

Packed binary cmd
RepAgent User
LTL ASCII Stream (packets)
SQT
SQM page cache
Current block
SQM
Packed binary cmd

…And Then We Do It Again (OBQ)
SRE (cmd structure) (cmd structure) SQL

packing
Packed ascii cmd
Packed binary cmd
DIST Packed ascii cmd DSIEXEC
SQT
DSI
md_sqm_write_request queue
SQM page cache
Current block
SQM
Packed ascii cmd

Introducing the SQM Command Cache (SRS 15.7)
The new process

• Queue write requestors (such as RepAgent User) now send both the command in packed format as well as
the command structure to SQM
• SQM caches the parsed command structure in the command cache
 The packed command is essentially also cached in the SQM page cache as it is appended to the current block
• Queue readers (e.g. SQT) use the packed command for processing
 This saves memory over the unpacked command structure
• DIST/DSI now attempt to bypass the structure
 When a command is read from SQT cache, it first checks to see if it is in the SQM command cache
 If it is in the SQM command cache, it simply retrieves the already parsed command structure and skips the parsing step
 If it isn’t in the SQM command cache, it parses the command as it did previously
Some notes
• Due to SQT sleeping a lot (contention, PIO, etc.) this works best with DIST if dist_direct_cache_read is
enabled (requires ASO)
• For DSI, reparsing may happen a lot if there is considerable latency

The New & Improved Picture (IBQ) – RS 15.7+
SRE (cmd structure)

Packed binary cmd
Packed binary cmd
LTL Stream (packets)
SQM page cache
Current block
Packed binary cmd

+
cmd structure

SQM Command Cache: Notes
Parsed structure is not written to disk

• It is only kept in memory in the SQM command cache
Why the SQM for caching (vs. RepAgent/DIST)

• Remember, a SQM may have multiple readers
• Also, remember, the SQM is where data is purged after delivered
• So, the SQM is the logical place to associate as:
 We can keep track best when all the readers have read it or not
 Know when it can be discarded from cache (e.g. when segs deallocated, remove corresponding commands from SQM cache).
SQM Command Cache is FIFO queue

• So it can overflow
• In which case, you have to read from disk (or the SQM page cache if you are lucky) and reparse the command
So you really have to watch for 2 things:

• Avoid reading from disk
 BlocksReadCached ≈ BlocksRead
• Avoid reparsing the same commands
 DISTCmdsDirectRepRecv ≈ (DIST) CmdsRead
 DSIESkipUnpack ≈ DSIECmdsRead (gonna be tough with DSI latency)
SQM Command Cache: Config
Server or Connection level settings

Configure replication server (requires restart of RS)…. Or….
Alter connection (requires suspend/resume log transfer?? …or restart of RS???)
Recommend connection level beyond defaults

Avoid wasting memory on low volume connections.
Configuration Parameter Default Description
Set cmd_direct_replicate on for the Executor thread to send parsed data directly to the Distributor thread along with
cmd_direct_replicate
Off binary data. When required, the Distributor thread can retrieve and process data directly from parsed data, and
improve replication performance by saving time otherwise spent parsing data again.
1MB/32 The maximum size, in bytes, of parsed data that Replication Server can store in the SQM command cache. Ignored
sqm_cmd_cache_size
20MB/64 if cmd_direct_replicate is off
Specifies, in each SQM block, the maximum number of entries with which the parsed data can associate. Set the
value of sqm_max_cmd_in_block to the number of entries in the SQM block. Depending on the data profile, each
sqm_max_cmd_in_block
320 block has a different number of entries because the block size is fixed, and the message size is unpredictable. If
you set a value that is too large, there is memory waste. If you set a value that is too small, replication performance
is compromised.

SQM Command Cache: Tuning (1)
So….why sqm_max_cmd_in_block???
• There really is two parts to the SQM command cache
 The command cache itself
 An array of pointers
• Sqm_max_cmd_in_block is used to dimension the array of pointers
• If you think about it, we want to cache every command that is in SQM page cache
 So….if we are averaging 10/commands per block and have a 4 block page (default) and a 16 page cache, we need to
dimension an array of 10*4*16=640
 In this case, we would set the sqm_max_cmd_in_block to 10
When setting the SQM command cache memory configuration:

• Increase sqm_cmd_cache_size if the Replication Server has a large total SQM cache.
 Total SQM cache = sqm_cache_size (in pages) * sqm_page_size (in blocks) *block_size (in kilobytes)
 Sqm_cmd_cache_size will need to be some factor larger than this (e.g. 3-4x)
• Sqm_max_cmd_per_block
 The default of 320 may be a bit high for 16K block
– If we assume 1K per cmd on average, we would only need 16
– The problem isn’t the average – it is the peak (e.g. empty transactions) during the amount of time covered by the cached blocks…320
doesn’t use that much memory (1K per block cached)
 Monitor with RS MC and decrease gradually

SQM Command Cache: Tuning (2)
Tune the values using RS MC

• Increase sqm_cmd_cache_size if SQMNoDirectReplicateInCache contains a large value.
 Translation: I tried to read a command from SQM command cache and had a cache miss because it wasn’t there (cache
was filled and spilled over)
• Increase sqm_max_cmd_in_block if SQMNoDirectReplicateInBlock contains a large value.
 Translation: I tried to add an entry to the cache, but all the slots in the array was already used.
• Increase the SQM page cache if SQMNoDirectReplicateInSQMCache contains a large value.
 Translation: I wanted to read a command from SQM command cache, but the corresponding page wasn’t in SQM page
cache, so I had to read from disk anyhow
Some counters to consider as help in initial settings

• SQM.CmdsWritten/SQM.BlocksWritten=CmdsPerBlock
 Calculate for each sample interval and then use the max to set sqm_max_cmd_in_block
 However, consider over sizing (e.g. 2-3x) to cover empty transactions, etc.

SQM Command Cache: Monitoring
Counter ID Module Display Name Description
Number of commands directly sent from Executor with pre-packing

58040 RepAgent RACmdsDirectRepSend
data format.
Number of commands excluded from direct replication limited by

6061 SQM SQMNoDirectReplicateInCache
SQM command cache memory size.
Number of commands excluded from direct replication limited by the

6062 SQM SQMNoDirectReplicateInBlock
number of commands can be stored in each SQM block.
6063 SQM SQMCacheCollisions Count of cache collisions. (overwritten before read)

6064 SQM SQMCMDCLONETIME Time spent on cloning CMD_COMMAND.
Number of commands excluded from direct replication limited by
6066 SQM SQMNoDirectReplicateInSQMCache
SQM cache memory size.
30037 DIST DISTCmdsDirectRepRecv Number of commands received by DIST directly from EXEC
Number of commands directly sent from DIST with pre-packing data

30038 DIST DISTCmdsDirectRepSend
format.
Number of unpacked commands received by DSI/E through
57179 DSIEXEC DSIESkipUnpack
command direct replicate.

SQM Enhancements
SQM Page Cache increased

• sqm_cache_size now supports 4096 pages
 Up from previous limit of 512
 With default 16KB blocks and 4 block pages, this allows SQM page cache to hold 256MB
• Reduces physical reads by SQMR during spikes
Dedicated Daemon for deleting segments
• sqm_async_seg_delete  ‘on’ by default
• Prior to this, SQM had to do both writing of new data to segments as well as deleting old segments
 SQM segment deletion often was in large chunks due to intermingling of transactions in log, presence of large
transactions and low priority of segment deletion in SQT module
 Result was SQM could spend considerable time deleting segments and during this time, it could not process incoming
data
• May need larger stable queues
 Asynchronous space reclamation

SQM Feature Map

Req’d
SQM page cache Any Any 15.1+
Large block sizes Any Any 15.5+  16, 32, 64, 128, 256
SQM command cache Any Any 15.7+
Async SQM delete daemon Any Any 15.7.1+
*RAX  Heterogeneous replication agents (e.g. RAO).

SQM Monitoring: throughput
6000 CmdsWritten Commands written into a stable queue by an SQM thread.
6002 BlocksWritten Number of 16K blocks written to a stable queue by an SQM thread
6004 BytesWritten Bytes written to a stable queue by an SQM thread.
6016 SleepsWaitSeg srv_sleep() calls by an SQM Writer client due to waiting for the SQM thread to get a free segment.
Active segments of an SQM queue: the number of rows in rs_segments for the given queue where used_flag = 1.
6020 SegsActive Authors Note: This is preferred for measuring backlog as it includes segments that are cached in SQT where as the
SQMR backlog counters do not
6021 SegsAllocated Segments allocated to a queue during the current statistical period.
6022 SegsDeallocated Segments deallocated from a queue during the current statistical period.
6035 AffinityHintUsed Segments allocated by an SQM thread using user-supplied partition allocation hints.
Updates to the RSSD..rs_oqid table by an SQM thread. Each new segment allocation may result in an update of oqid
6036 UpdsRsoqid
value stored in rs_oqid for recovery purposes.
6038 WritesTimerPop SQM writer thread initiated a write request due to timer expiration.
SQM writer thread has forced the current block to disk when no real write request was present. However, there is data
6039 WritesForceFlush to write and we were asked to do a flush, typically by quiese force RSI or explicit shutdown request. Authors Note:
Most often, this will be in an RRS and will be due to the route configured sync interval (or size)
6040 WriteRequests Message writes requested by an SQM client.
Number of full blocks written by an SQM thread. Individual blocks can be written due either to block full state or to
6041 BlocksFullWrite
sysadmin command 'slow_queue' (only one message per block).
6049 CmdSize Command size written to a stable queue.

SQM Monitoring: throughput comments
Some interesting derived statistics

• CmdsWritten/sec, BlocksWritten/sec, MBytesWritten/sec
• SegsAllocated vs. SegsDeallocated
• UpdsRsoqid/minute
 Increase SQM block size if too many as first option….or increase sqm_recover_seg
• Cmds/Block = CmdsWritten/BlocksWritten
Some key stats to consider
• CmdSize
• AffinityHintUsed

SQM Monitoring: Time
The elapsed time, in milli-seconds, to allocate a new segment. Timer starts when a segment is
6023 TimeNewSeg
allocated. Timer stops when the next segment is allocated.
Elapsed time, in milli-seconds, to process a segment. Timer starts when a segment is allocated or
6029 TimeSeg
Repserver starts. Timer stops when the segment is deleted.
6057 SQMWriteTime The amount of time taken for SQM to write a block.
The amount of time waiting for allocating segments. Authors Note: This is similar to SleepsWaitSeg
6059 SQMWaitSegTime (e.g. counter_obs should be identical), however, the time is useful it showing how bad the situation is
as well as RSSD response time.
6064 SQMCMDCLONETIME Time spent on cloing CMD_COMMAND.
Some derived stats

• IO Throuhgput (MB/sec) = (block_size * BlocksWritten/1024.0)/(SQMWriteTime/1000.0)
• IO Speed (msPerIO) = SQMWriteTime/BlocksWritten
Technically we should use the SQM PageSize for this….if sqm_cache is enabled
= SQMWriteTime/(BlocksWritten/sqm_page_size)
• Allocation speed (ms per allocation) = SQMWaitSegTime/SegsAllocated
• Allocation overhead = (SQMWaitSegTime/1000.0) / sample time
• Allocation frequency (AvgNewSeg) = (TimeNewSeg/1000.0)/SegsAllocated
Think about this one….should you really have new allocations every second???

SQM Monitoring: other
6014 SleepsStartQW srv_sleep() calls by an SQM Writer client due to waiting for SQM thread to start.
6017 SleepsWriteRScmd srv_sleep() calls by an SQM Writer client while waiting to write a special message, such as synthetic rs_marker.
6018 SleepsWriteDRmarker srv_sleep() calls by an SQM Writer client while waiting to write a drop repdef rs_marker into into inbound queue.
6019 SleepsWriteEnMarker srv_sleep() calls by an SQM Writer client while waiting to write an enable rs_marker into the inbound queue.
Writes failed by an SQM thread due to loss detection, SQM_WRITE_LOSS_I, which is typically associated with a
6037 WritesFailedLoss
rebuild queues operation.
6050 XNLWrites Large messages written successfully so far. This does not count skipped large message in mixed version situation.
6052 XNLSkips Large messages skipped so far. This only happens when site version is lower than 12.5.
6053 XNLSize The size of large messages written so far.
Number of commands excluded from direct replication limited by SQM command cache memory size. Author: SQM
6061 SQMNoDirectReplicateInCache
command cache too small
Number of commands excluded from direct replication limited by the number of commands can be stored in each SQM
6062 SQMNoDirectReplicateInBlock block. Author: This probably won’t happen unless you have a lot of empty transactions or large block size and really
narrow row widths.
Count of cache collisions. Author: cache overwritten before read – SQM cmd cache too small or there is latency (e.g.
6063 SQMCacheCollisions
SQT cache is full and DIST lagging)
Number of commands excluded from direct replication limited by SQM cache memory size. Author: SQM page cache
6066 SQMNoDirectReplicateInSQMCache
too small for SQM cmd cache
You probably don’t need to be concerned about most of these except the SQM command cache ones
(highlighted - discussed previously)

SQM starting configuration in SRS
Config Default Recommended Rec w/ ASO Comments

block_size 16 (n/a) 256 Server
disk_affinity off on on Queue
init_sqm_write_delay 100 10 10 Server
init_sqm_write_max_delay 1000 50 50 Server
sqm_async_seg_delete on on on New 15.7
sqm_cache_enable on on on Server
sqm_cache_size 16 128 16 Server/Queue
sqm_cmd_cache_size 20971520 20971520 20971520 Server/Queue
sqm_max_cmd_in_block 320 320 320
sqm_page_size 4 4 4 Server
sqm_recover_segs 1 10 3 Server
sqm_write_flush on off off Server

SQM Summary
Enable the page cache

• Keep the page size and cache size at default unless you are absolutely sure more is needed
• Rather than increasing the page size, consider increase the block size
Tune the SQM read/write delays
• Use 10ms and 50ms respectively (which means 100ms will be the max)
Monitor the write speed and allocation frequency/overhead
Monitor the command cache
• Learn to differentiate just plain on long latency (e.g. DSI) from cache size issues
 For DSI, was time spent reading or was time spent waiting on results (or similar)…if results or other apply aspect, it is
RDS.RDB problem and not a SQM cache issue
 For SQT/DIST, was the SQT cache full?? If so, it is an SQT or DIST issue, not SQM cache
• Make sure you have enough page cache to support command cache size
 SQMNoDirectReplicateInSQMCache > 0
• Make sure the command cache is big enough
 SQMNoDirectReplicateInCache>0 or CacheCollisions > 0

Lab #2
RepAgent User and SQM(i)
Lab 2: Questions
Use the same detail report as from last lab
RepAgent
1. Should we change any RepAgent configs (packet size, scan_batch_size)?
2. How much would using the NRM thread help?
3. Would async parsing help more or not so much?
4. Were there any issues with write waits between RAUser & SQM?
SQM (writer)
5. Should the sqm_recover_seg be adjusted?
6. Would a larger block_size configuration help?
7. How did the rate of segment allocations/deallocations appear?
8. Was SQM page cache sufficient?

SQT (including SQMR)
Maintaining transactional serialization – a key aspect to REAL replication
SQT modules and functions
The SQT thread consists of two main modules

• SQMR – SQM Reader
 Remember, there can be multiple readers from one queue (e.g. WS + DIST)
 Consequently there is a numbering scheme based on which thread starts first
 We won’t go into it here, but the only way to tell is look at the label in rs_statdetail
• SQT – Stable Queue Transactions
 Sorts transactions into commit sequence using a series of 4 lists
 This enforces transactional serialization – one of the key requirements for data replication
– Anything less is data synchronization based on row image at time of synchronization
SQM Reader (SQMR)

• Attempts to read from the inbound queue when SQT tries to fill cache (which is constant)
• The read will either be a cached read from page cache or physical read from disk
• If a physical read is required, SQMR posts the read and then SQT thread goes to sleep
 dAIO deamon will check for IO completion
 SQT sleeping is a bit of an issue as we will discuss later wrt DIST thread
• If a transaction is removed from SQT cache (later), the SQMR may have to re-read it again.

Common Problem #1: SQM Contention
When SQM is writing the block (modifying in memory or flushing to disk)

• Reader has to be blocked from reading
 Same thing happens in ASE via the MASS bit – you cannot read or write to a MASS (nK buffer) when someone else is
modifying the MASS or flushing the MASS to disk
• Blocked processes can either:
 Constantly spin and check the lock (e.g. a spinlock or mutex)
 Go to sleep and check again later (e.g. a logical lock or sleep mutex)
SQMR/SQT will sleep on contention

• The amount of time is controlled via two configuration settings
 sqt_init_read_delay (default 2000ms or 2 seconds)
 sqt_max_read_delay (default 10000ms or 10 seconds)
Contention leads to cache miss

• If attempting to read a high volume connection page, it will go to sleep quickly
• By the time it wakes up, SQM has moved on by several thousand rows or more
 At ~10-20 rows/16KB block, SQMR wakes up to find itself 100 16KB blocks behind (~2 segments)
 Soooo…it NOW has to go to sleep to await the physical read……
SQMR & SQM – The “Sleep” Paradox
What really happens….

• SQMR tries to read a block…it knows how many rows were on the block last time
 This reading of the block does cause contention….but ….it should be very short lived (mutex)
• If no new commands/rows were added to the block, it goes to sleep
 If this is the first attempt to read that block, the time it sleeps is sqm_init_read_delay
 When the time expires, it tries to read the block again
– If successful, it resets the timer to sqm_init_read_delay
 If block is still being written to, it doubles the time and sleeps again
– ….and so forth until the delay is 2x sqm_max_read_delay
The problem
• If we sleep too long,
 we may have to read the block from disk
 …and since SQMR is in SQT, the SQT will sleep and not serve DIST
• If we sleep too short, we will block the writer a lot
 ….but keep the DIST fed….so not quite as bad as above – hence start low and raise configs
 ….but might result in RAWriteWaits as exec_sqm_write_request_limit reached
– Either increase exec_sqm_write_request_limit…..or increase SQT read delays by a few milliseconds (10-20)

SQT (SQMR) Read Delay This graph shows how the sqt_init_read_delay (10)
& sqt_max_read_delay (100) parameters behave
from a steady state when queue not active to a peak
(~2x sqt_max_read_delay as designed) when queue
active and SQMR is caught up. Once the cache is
full, the DIST read rate starts driving the SQMR rate
as SQMR has to wait for space in SQT cache. Did
we really need 256MB of cache…likely not - just
need to make DIST faster.
SQMR is reading
current SQM Primary input
block being SQMR lagging but 100% slows/stops - SQMR
written cached reads as it is starts catching back
SQMR is lagging -
reading from SQM cache- SQM Write Cache up
SQT cache is full too small??? (No)

SQM contention: the solution
Decrease the read delay

• Try using values of 10 and 25 or 50 respectively
• If 25 seems too low, remember, the time you give will be doubled due to the logic
 Sooooo….a 50ms setting = 100ms wait which is 10% of a full second
 A second is an eternity to a computer
Tuning tips
• During low or medium activity, ignore
• During peak activity, watch how many Sleeps there are per attempt to read
• SleepsPerBlock = SleepForWriteQTime.counter_obs/BlocksRead (or better yet …/CmdsRead)
 If >1 per cmd consider increasing sqt_init_read_delay to average and sqt_max_read_delay to 2x avg.
– We would like to grab a full txn per read – e.g. 3 cmds/read…..or probably 5-10 sleeps/block is normal
 If 1 per block, that is acceptable…but edgy…anything <2 per block is probably getting worrisome
 0 or near 0 per block is only acceptable if BlocksReadCached is 70%+ of BlocksRead
– If cache hit ratio is high, then we are reading out of page cache and that is great
– If cache hit ratio drops, that indicates we are need to do more physical reads than desirable

Common Problem #2: Removed Transactions
Transactions can be removed from SQT cache

• Later discussion
This causes a double penalty for the SQMR
• First, when the commit for the removed transaction is seen, the SQMR must stop processing current
transactions and rescan the removed transaction records
 Most likely, this will require physical reads….during which the SQT will sleep pending IO
• Second, once the removed transaction is process, the SQMR must then restart current processing where it
left off….
 …except by now, it is likely that the point it left off is no longer in page cache, so once again, the SQT must do physical
reads.
Large transactions typically occur during batch processing

• From a DML perspective, this is a peak period
• Consequently, unless the IO subsystem is extremely fast, once the SQMR needs to reread a removed
transaction, the chances of it catching back up and doing cached reads is not likely unless there are lulls in
batch processing.

SQMR Monitoring
62000 CmdsRead Commands read from a stable queue by an SQM Reader thread.
62002 BlocksRead Number of 16K blocks read from a stable queue by an SQM Reader thread.
62004 BlocksReadCached Number of 16K blocks from cache read by an SQM Reader thread.
62007 XNLReads Large messages read successfully so far. This does not count partial message, or timeout interruptions.
62008 XNLPartials Partial large messages read so far.
Number of interruptions so far when reading large messages with partial read. Such interruptions happen due to time out,
62009 XNLInterrupted
unexpected wakeup, or nonblock read request which is marked as READ_POSTED.
62010 SleepsStartQR srv_sleep() calls by an SQM Reader client due to waiting for SQM thread to start.
62011 SQMRReadTime The amount of time taken for SQMR to read a block.
62013 SQMRBacklogSeg The number of segments yet to be read.
62014 SQMRBacklogBlock The number of blocks within a partially read segment that are yet to be read.
62015 SleepForWriteQTime The amount of time SQMR wait for a queue write.
62017 SQMRReadCacheTime The amount of time SQMR spent reading blocks from the SQM cache.
Some interesting derived stats

• Cmds/Sec = CmdsRead/seconds
• Cached Read % = BlocksReadCached/BlocksRead
• SleepsPerBlock = SleepForWriteQTime.counter_obs/BlocksRead
 More on this on next slide
• AvgSleepPerBlock = SleepForWriteQTime/BlocksRead
• msPerIO = SQMRReadTime/BlocksRead
 This is skewed low as includes cached reads but is an overall average…..
 Disk read msPerIO = (SQMRReadTime-SQMRReadCacheTime)/(BlocksRead-BlocksReadCached)

Sorting transactions into commit sequence
The SQT thread uses 4 lists to sort transactions

• Prevents uncommitted transactions from being replicated
 Remember, the RepAgent just sends everything marked….not just committed
• Ensures transactional serialization (commit sequencing)
The four key SQT lists (sometimes called queues)
• OPEN – transactions for which SQT has not yet seen the commit for yet
• CLOSED – transactions for which the SQT has seen the commit, but the SQT client (e.g. DIST) has not read
or finished reading the complete transaction yet
• READ – transactions which have been fully processed by the SQT client but are still in cache
• TRUNC – a list of transactions in cache.
 As transactions are read into cache, they are added immediately to the trunc list
 The longest list of contiguous transactions from start of list is sent to SQM to deallocation.

SQT Txn Sorting: Seg 0 Block 0 Read (1)
Open Closed Read Truncate

TX1 TX1
BT1
I11
I12
0.6 0.5 0.4 0.3 0.2 0.1 0.0

CT1 D19 I18 I17CT3 D35 D34 CT2 I33 U27 U32 I26 U31 I25 BT3 I24 I16 I23 D15 I22 U14 I21 I13 BT2 I12 I11 BT1
End of Queue Row 0.3.0

Row 0.3.1 Beginning of Queue
Row 0.3.2
Logical QID’s (LQIDs) Row 0.3.3
Row 0.3.4


TX1 TX2 TX1
BT1 BT2 TX2
I11 I21
I12 I22
I13
U14
0.6 0.5 0.4 0.3 0.2 0.1 0.0

Row 0.3.0
End of Queue Row 0.3.1 Beginning of Queue
Row 0.3.2
Row 0.3.3
Row 0.3.4

TX1 TX2 TX1
BT1 BT2 TX2
I11 I21
I12 I22
I13 I23
U14 I24
D15
I16
0.6 0.5 0.4 0.3 0.2 0.1 0.0



TX1 TX2 TX3 TX1
BT1 BT2 BT3 TX2
I11 I21 U31 TX3
I12 I22 U32
I13 I23
U14 I24
D15 I25
I16 I26
0.6 0.5 0.4 0.3 0.2 0.1 0.0



TX1 TX3 TX2 TX1
BT1 BT3 BT2 TX2
I11 U31 I21 TX3
I12 U32 I22
I13 I23
U14 I24
D15 I25
I16 I26
U27
CT2
0.6 0.5 0.4 0.3 0.2 0.1 0.0



TX1 TX3 TX2 TX1
BT1 BT3 BT2 TX2
I11 U31 I21 TX3
I12 U32 I22
I13 I33 I23
U14 D34 I24
D15 D35 I25
DIST Reads Txn #2
I16 CT3 I26
I17 U27
I18 CT2
0.6 0.5 0.4 0.3 0.2 0.1 0.0



TX1 TX2 TX3 TX1
BT1 BT2 BT3 TX2
I11 I21 U31 TX3
I12 I22 U32
I13 I23 I33
U14 I24 D34
D15 I25 D35
I16 I26 CT3 DIST Reads Txn #3
I17 U27
I18 CT2
D19
CT1
0.6 0.5 0.4 0.3 0.2 0.1 0.0


TX1 TX2 TX3 TX1
BT1 BT2 BT3 TX2
I11 I21 U31 TX3
I12 I22 U32
I13 I23 I33
U14 I24 D34
D15 I25 D35
I16 I26 CT3
I17 U27 DIST Reads Txn #1
I18 CT2
SQT can finally tell SQM to
D19 remove from seg 0.0 to 0.6
CT1 and clear the Trunc list
0.6 0.5 0.4 0.3 0.2 0.1 0.0

SQT Cache (the Real Picture)

TX1 TX2 TX3 TX1
BT1 BT2 BT3 TX2
I11 I 21 U31 TX3
I 12 I 22 U32
I 23 I 33
I 13
I 24 D34
U14
D15
I 25 D35 SQT Cache
I 26 CT3
I 16
U27
I 17
CT2
I 18
D19
CT1
Pointers

Empty Transactions and Transactions Removed
Empty Transactions
• Due to explicit begin/commits in PDB but no DML (e.g. chained mode/ISO3 queries, etc.)
 Used to be a big problem, but ASE 15 reduced a lot of these by not flushing empty BT/CT from ULC to primary log
• Can be due to DML on unreplicated tables
• Can also be caused by system tasks events such as reorgs
 However, system level empty transactions were filtered in ASE 12.5.x
• SQT simply discards them
 Depending on the SRS version whether counted first in CLOSED or not…
Transactions Removed
• Really large transactions with a lot of commands could fill SQT cache
• To prevent this, the SQT will remove large transactions from the OPEN list if cache is low
 Note that large transactions in CLOSED or READ will not be removed from cache
• When COMMIT is seen for a removed transaction, the SQT moves it to CLOSED
• When SQT client wants to process it, SQT must rescan the transaction from disk
 During this rescan, not only is it most likely a lot of physical reads, but SQT cannot simultaneously be reading new
transactions, so ongoing reads stop
• An occasional removed large transaction is not a problem….10’s of them are.

SQT Performance
24000 CmdsRead Commands read from SQM. Commands include XREC_BEGIN, XREC_COMMIT, XREC_CHECKPT.
Commands in transactions completely scanned by an SQT thread. Authors Note: This counter is useful for spotting
24002 CmdsTran
large transactions (counter_max) as well as average cmds in a transaction (counter_total/counter_obs).
SQT thread memory use. Each command structure allocated by an SQT thread is freed when its transaction context is
removed. For this reason, if no transactions are active in SQT, SQT cache usage is zero. Authors Note: If this
24005 CacheMemUsed
reaches the maximum and remains there for any period of time without an large transactions, it is indicative of DIST
being slow and SQT is just buffering (can be proven by the number of CLOSED transactions)
24006 MemUsedTran Memory consumed by completely scanned transactions by an SQT thread.

24019 SQTCacheLowBnd The smallest size to which SQT cache could be configured before transactions start being removed from cache.
An SQT client awakens the SQT thread who is waiting for a queue read to complete. Turn dist_direct_cache_read on
24020 SQTWakeupRead
if you see this (ASO required)
The time taken by an SQT thread (or the thread running the SQT library functions) to read messages from SQM.
24021 SQTReadSQMTime
Authors Note: this is a wrapper around the SQMRReadTime so actual SQT read time is less.
24023 SQTAddCacheTime The time taken by an SQT thread (or the thread running the SQT library functions) to add messages to SQT cache.
The time taken by an SQT thread (or the thread running the SQT library functions) to delete messages from SQT
24025 SQTDelCacheTime
cache.
24031 CacheLow SQT cache is too low to load more than one transaction into cache.
24032 SQTResyncPurgedTrans Transactions purged by resync database command
24033 SQTControlMem Number of time the memory control is executed in a SQT.
24038 SQTParseTime The amount of time, in milli-seconds, spent by SQT in parsing commands.
SQT Performance Notes
Notes on individual counters

• CmdsRead
 ideally, this should be the same as with SQMR CmdsRead
 However, if a transaction is removed, when it is re-scanned, the SQMR CmdsRead will increment, but the SQT
CmdsRead will not (it will remain at 0)
 This is because the commands are not read into cache. Once removed, it stays removed. When the DIST attempts to
read a removed transaction, the SQT will rescan via the SQMR and pass the commands directly to the DIST as there is
no need to re-sort it.
• MemUsedTrans
 This is useful for a rough sizing estimate – you can compute the amount of memory used by average transactions as
well as the memory used for the largest cached transaction and use these numbers as estimates for sizing if transactions
are removed.
Derived counters of interest

• SQT rate  cmds/sec = CmdsRead/<seconds>
• Avg trans size (cmds)  CmdsTran.counter_total/CmdsTran.counter_obs
• Avg trans size (bytes)  MemUsedTrans.counter_total/CmdsTran.counter_obs

SQT transaction sorting

24001 OpenTransAdd Transactions added to the Open queue.
Transactions whose constituent messages have been removed from memory. Removal of transactions is most
24009 TransRemoved
commonly caused by a single transaction exceeding the available cache.
24011 TruncTransAdd Transactions added to the Truncation queue.
24012 ClosedTransAdd Transactions added to the Closed queue.
24013 ReadTransAdd Transactions added to the Read queue.
24014 OpenTransRm Transactions removed from the Open queue.
24015 TruncTransRm Transactions removed from the Truncation queue.
24016 ClosedTransRm Transactions removed from the Closed queue.
24017 ReadTransRm Transactions removed from the Read queue.
24018 EmptyTransRm Empty transactions removed from queues. Authors Note: a lot of these could be a problem….think RA latency
Current open transaction count. Authors Note: This is an interesting counter as it can roughly expose the degree of
24027 SQTOpenTrans
concurrency at the PDB for DML statements, especially during batch processing.
Current closed transaction count. Authors Note: This should never be allowed to climb that high – possibly a few
24028 SQTClosedTrans hundred for the DSI, but in the case of the SQT, this should be not even that high. High values (e.g. thousands)
simply indicate that the next thread is slow in processing and the SQT cache is being used as a buffer.
Current read transaction count. Authors Note: Typically, this will be pretty low as it represents the transactions that
24029 SQTReadTrans have been read that are still in cache and NOT the total transactions read. As transactions are truncated, they will no
longer be in cache and consequently not counted.
Current truncation queue transaction count. Authors Note: This should be the sum of Open, Close and Read as
24030 SQTTruncTrans transactions are added to this as soon as they are read into SQT cache (still OPEN) and will be counted until
truncated from cache.

SQT transaction sorting notes
The four lists *Add

• Ideally, you want OpenTransAdd = ClosedTransAdd = ReadTransAdd….and ClosedTrans low
 Adjusted for EmptyTransRm
 And assuming that SQT cache is not full
• Rationale – if they are the same, then data is flowing straight through
 However, if SQT cache is full, then a tran will only be added when there is room, which will happen only when one is
read…..sooooo….if cache is full, this should be a normal state
Open & Closed

• The number of open transactions in the queue at any point is good for computing the amount of memory
needed – e.g. avg tran size in bytes * number of open transactions (if large). Otherwise, it only really
indicates the amount of concurrency at the primary
• The CLOSED list is critical for determining SQT bottlenecks
 If this is very high (e.g. 500+ ….especially into the thousands), then the DIST is lagging (for inbound queue) and the SQT
cache is simply a buffer that will eventually fill if the DIST lag persists for any period of time.
 For the SQT thread, if you see a large number of transactions CLOSED, check dist_direct_cache_read (ASO option
required). If it is on, then the DIST may be lagging due to slow outbound queues or too many destinations.

Example SQT: Impact of empty transactions
Inbound SQT Cache Txn Queues

Source Connection: SERVER.database (102)
Interval OpenTxnAdd TxnRemoved ClosedAdd EmptyTxnRm ReadTxnAdd TruncTrAdd

------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
(1) 15:19:52 -> 15:36:06 166671 0 130883 35782 129681 166671
(2) 15:36:07 -> 15:44:45 201610 0 157598 44018 179226 201610
(3) 15:44:46 -> 15:53:20 225184 0 175940 49241 179986 225184
(4) 15:53:21 -> 16:01:53 264745 0 207058 57697 212724 264745
(5) 16:01:54 -> 16:10:20 337649 0 264678 72977 265329 337649
(6) 16:10:21 -> 16:18:52 277224 0 217166 60060 208832 277224
(7) 16:18:53 -> 16:27:18 406881 0 317854 89022 328692 406881
(8) 16:27:19 -> 16:35:40 492036 0 385000 107031 384124 492036
(9) 16:35:41 -> 16:44:03 407924 0 319120 88812 321529 407924
(10) 16:44:04 -> 16:52:29 321035 0 250336 70703 250073 321035
(11) 16:52:30 -> 17:00:50 200070 0 156562 43513 156621 200070
(12) 17:00:51 -> 17:09:15 151736 0 119397 32359 119435 151736
(13) 17:09:16 -> 17:17:38 149918 0 117857 32061 117875 149918
(14) 17:17:39 -> 17:25:57 137120 0 106931 30187 106955 137120
(15) 17:25:58 -> 17:34:18 132493 0 103336 29158 103336 132493
------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
3872296 0 3029716 842621 3064418 3872296
Adding more SQT cache shouldn’t even be considered due to no TransRemoved. However, in interval 1, the
DIST was lagging as can be seen by the lower value for ReadTransAdd vs. ClosedTransAdd – but it started
catching up by interval 2 & 3 – and especially by mid-way through the sample periods
Example SQT: Impact of empty transactions (cont)
Inbound SQT Cache Txn Queues

Source Connection: SERVER.database (102)
Interval TxnRemoved OpenTrans ClosedTran ReadTrans TruncTrans

------------------------------ ---------- ---------- ---------- ---------- ----------
(1) 15:19:52 -> 15:36:06 0 4 28507 12 28524
(2) 15:36:07 -> 15:44:45 0 2 22377 8 22387
(3) 15:44:46 -> 15:53:20 0 2 5481 11 5495
(4) 15:53:21 -> 16:01:53 0 1 4476 10 4488
(5) 16:01:54 -> 16:10:20 0 2 439 10 451
(6) 16:10:21 -> 16:18:52 0 2 7073 7 7084
(7) 16:18:53 -> 16:27:18 0 2 1761 8 1772
(8) 16:27:19 -> 16:35:40 0 2 814 8 825
(9) 16:35:41 -> 16:44:03 0 2 132 10 145
(10) 16:44:04 -> 16:52:29 0 2 765 7 774
(11) 16:52:30 -> 17:00:50 0 1 1 6 9
(12) 17:00:51 -> 17:09:15 0 2 51 4 58
(13) 17:09:16 -> 17:17:38 0 1 1 7 10
(14) 17:17:39 -> 17:25:57 0 1 0 7 9
(15) 17:25:58 -> 17:34:18 0 2 1 8 12
------------------------------ ---------- ---------- ---------- ---------- ----------
0 28 71879 123 72043
This must be a huge SQT cache as we were simply buffering ~30,000 transactions due to latency in the DIST
processing. Note that ReadTrans will be low as transactions are discarded from cache as soon as truncated.
However, given the resurging increases in ClosedTrans at #6 and #10, the problem is more frequent vs. acute
DSI/SQT HVAR/RTL command parsing

24034 SQTPrsCheckout Number of SQT pre-parsed commands whose memory was counted against sqt_max_prs_size.
Number of SQT pre-parsed commands whose memory is counted against sqt_max_prs_size and currently kept in
24035 SQTPrsCmdIn
memory waiting to be read/consumed by client.
24036 SQTPrsCmdOff Number of SQT pre-parsed commands released early due to sqt_max_prs_size memory consumption control.
Memory size consumed by SQT pre-parsed commands whose memory is counted against sqt_max_prs_size and
24037 SQTPrsMemory
currently kept in memory waiting to be read/consumed by client.
24038 SQTParseTime The amount of time, in milli-seconds, spent by SQT in parsing commands.
To be discussed later time permitting

• Normally, the SQT contains unparsed transactions. The only thing the SQT needs to know is which
transaction the operation belongs to.
• When compiling transactions for DSI when using HVAR/RTL, the SQT cache also fully parses the
command
We need both the operation (for compilation) and values
• The compilation database (CDB) is created from scanning the parsed commands

SQT Feature Map

Req’d
SQM page cache Any Any 15.1+
Large block sizes Any Any 15.5+  16, 32, 64, 128, 256
SQM command cache Any Any 15.7+
PRS parsed command cache Any Any 15.7.1+ ~ See below
No real new SQT features in recent releases

• The bottlenecks were mainly due to SQM caching, etc.
PRS parsed command cache (sqt_max_prs_size) (DSI/SQT)

• This really only applies to the DSI aspect of SQT cache and even then is really only used by connections with
dsi_compile_enable=true (HVAR/RTL)
• Typically, the SQT module doesn’t need to parse commands, but for the DSI, the DSI parses commands in the
SQT cache and stores them in the PRS cache as part of the compilation process
 As a result, the new default for sqt_max_prs_size is now 2GB (2 billion bytes)
 This needs to be factored into memory consumption/allocation planning in addition to CDB size
• Not sure if there is any benefit to normal SQL mapping for non-HVAR/RTL

SQT starting configuration in SRS

dist_sqt_max_cache_size 0 256MB 256MB Connection
dsi_sqt_max_cache_size 0 128MB 512MB Connection
sqt_init_read_delay 2000 10 10 Server
sqt_max_cache_size 20971520 (20MB) 32MB 32MB Server
sqt_max_prs_size 2147483647 2147483647 2147483647 Chg in 15.7+
sqt_max_read_delay 10000 50 50 server

SQT Summary
Monitor the SQT cache

• Watch TransRemoved
• OpenTransAdd = ClosedTransAdd = ReadTransAdd + EmptyTransRm
 If SQTCache isn’t full
• SQTClosedTrans <500 and definitely <1000
 If not, indicated DIST slowness
Tuning tips
• DO NOT ADD SQT CACHE UNLESS TransRemoved > ~3-5+
 ….especially if SQTClosedTrans > 100
 If you do, you are just masking the real problem by increasing the “buffering”…it will still fill
• Use SQM’s CmdSize * SQT’s CmdsTran.counter_max * max(3,SQTOpenTrans) to compute SQT cache
necessary
 Note that some transactions simply are too large to reasonably cache (e.g. 1M cmds)
 Consider a 100K cmd transaction with a 1K cmd size…it would take 102,400,000 bytes (~100MB) to
cache…consequently a 512MB cache is probably sufficient for an absolute upper bounds for an inbound SQT cache
unless you have a lot of really large transactions and you have a lot of memory.

Distributor Processing
The “brains” for subscriptions
DIST Processing
The DIST thread has 5 major functions

• Read commands from SQT
• Parse the commands (necessary for subscription resolution, etc.)
• Perform subscription resolution
• Ensure the transactional serialization for the target
• Deliver/write the commands to the target outbound queues
Most of the time, people really only know 3 of these via key modules
• SRE – Subscription Resolution Engine
 Performs subscription resolution and migration
• TD – Transaction Delivery
 Re-sequences the transaction commands based on the commit record to enforce serialization
• MD – Message Deliver
 Writes the commands into the appropriate outbound queues

DIST processing illustration
PRS
O C R T (cmd structure) SRE 3
1 2 TD 4
MD
Read request 5
packing
SQT
Packed ascii cmd
SQM (o)
Packed ascii cmd

Inter-Thread Communication
There are 3 methods available A

• Via shared memory Memory/Caches
Fastest, but requires adding mutexes, etc.
• OpenClient callbacks
Requires coding handlers and is somewhat restrictive in
concurrent events
OpenClient
• Standard IPC message queues B Callback
Similar to inter-app messaging (e.g. JMS) but uses in-
memory queues
SRS uses all three

• Initially, the predominant inter-thread communication
was via message queues
• Callbacks were reserved for exception handling, etc. OpenServer
• After SMP release, more in-memory sharing added
C Message Queues
via extension of mutexes

Inter-thread Messaging
OpenServer Messaging
• In-Memory message queues A
 Provides means for asynchronous processing
 Each thread may interact with 1 or more message queues Memory/Caches
 Provides a cache to reduce impact of ‘surges’ in processing
– exec_sqm_write_request_limit
– md_sqm_write_request_limit
– exec_nrm_request_limit
• RS configurations:
 num_msgqueues OpenClient
 num_msgs
B Callback
• Configuring:
 If too low, RS will crash
 Shouldn’t have to tune, unless running a lot of connections
RS Latency
• The queues do not ‘belong’ to a thread…
 and therefore the thread does not have to be running for another
thread to act on the queue
• …but, for a back & forth exchange, thread OpenServer
sequencing & execution becomes an issue C Message Queues

Problem #1: DIST latency due to SQT sleeping
DIST requests txns from SQT

• When it finishes previous txn, requests next one from
SQT
SQM command cache
• SQT looks in cache and sends txn header
• DIST requests each command in sequence cmd direct replication
• SQT sends each command in sequence
O C R T
• DIST tells SQT when complete SQT transaction cache
The problem is SQT is frequently sleeping

• Either due to awaiting physical read
• …or due to SQMR contention with SQM
DIST can “wakeup” SQT but…

• This takes a bit of time (context switch)
• Overall result is DIST processing lags
• …which means it is likely that cmd direct replication
may miss due to overwritten…
• ….which increases DIST parsing time

The solution: dist_direct_cache_read
Dist reads direct from SQT cache

• Still uses messaging for txn coordination
• Reads txn cmds directly from SQT cache SQM command cache
• Requires ASO option cmd direct replication

• DSI has always done this (no ASO reqd)
O C R T
This was due to early implementation of DSI as single SQT transaction cache
thread
There are counters to detect this

• DISTReadTime
high read time is a key clue this is a problem
unless the transaction was removed from cache
If rescanning a large txn, this could be high naturally due
to PIO to do the read
• SQTWakeupRead
SRS 15.0+
This is a definite indicator

Throughput impact on 1M row batch job (single txn)
~85% increase
~45% increase
30 sec sample intervals

Latency impact on 1M row batch job (single txn)
0:02:00
decrease in
latency
30 sec sample intervals

SRE Performance
Usually, this isn’t an issue if sts_cache_size > 2000

• If sts_cache_size too small, the chances of a cache miss increase
 Remember, sts_cache affects entire SRS – so a RepAgent normalization could cause an object to get removed before
the DIST processes an older record
• Sts_cache_size greater than 5000 likely not needed
 The occasional cache miss shouldn’t be an issue
• Do not use sts_full_cache_rs_objects/rs_subscriptions on large schemas
 This will increase startup time significantly.
One issue that is a consideration is the LACK of repdefs

• If there is no repdef, there can be no subscription….therefore SRE has to discard the row
 Key counter for this is SREstmtsDiscard
• Note that MSA uses database repdefs and current RS MC only counts lack of table repdefs….
• ….however, unless using ASE 15.7 with repdef elimination, the lack of a table repdef even with MSA is a
performance issue at the DSI (later discussion)
The overall key for DIST (including SRE, TD & MD) is focus on time counters

TD & MD performance
TD performance is rarely an issue

• However, if MD is slow, it will show up as TD time
MD performance
• As with any queue writer, a lot depends on the size of the write request queue and speed of the device
 md_sqm_write_request_limit is equivalent as with exec_sqm_write_request_limit
• However, an additional consideration is the number of targets for each subscribing row
• Consider the following examples:
 Single source with 10 targets (no routes)
– If source does 1,000 cmds/sec, the DIST has to maintain 10,000 cmds/sec
 Single source with 10 targets, but using 3 routes to 3 RS’s with 3-4 targets each
– If source does 1,000 cmds/sec, each DIST only needs to do ~3,000 cmds/sec
RS MC counters track time and waits

• See DIST counter details on next slides

DIST throughput counters
Counter_id Display_name Description
30000 CmdsRead Commands read from an inbound queue by a DIST thread.
30002 TransProcessed Transactions read from an inbound queue by a DIST thread.
30004 Duplicates Commands rejected as duplicates by a DIST thread.
30006 CmdsIgnored Commands ignored by a DIST thread while it awaits an enable marker.
30008 CmdsMaintUser Commands executed by the maintenance user encountered by a DIST thread.
30010 CmdsDump Dump database commands read from an inbound queue by a DIST thread.
30011 CmdsMarker rs_markers placed in an inbound queue. rs_markers are enable replication, activate, validate, and dump markers.
Commands encountered by a DIST thread for which no replication definition exists. Authors Note: For non-MSA applications, this
30013 CmdsNoRepdef is a sign that a table is marked that shouldn’t be and data is likely discarded. For MSA applications, this is key clue that likely
updates and deletes will be slower due to where clause construction and a repdef might help.
Updates to RSSD..rs_locater table by a DIST thread. A DIST thread performs an explicit synchronization each time a SUB RCL
30015 UpdsRslocater
command is executed.
30024 TDbegin Begin transaction commands propagated by a DIST thread.
30025 TDclose Commit or Rollback commands processed by a DIST thread.
30026 RSTicket rs_ticket markers processed by a DIST thread.
30027 dist_stop_unsupported_cmd dist_stop_unsupported_cmd config parameter.
30032 SqtMaxCache dist_sqt_max_cache_size config parameter.
30037 DISTCmdsDirectRepRecv Number of commands received by DIST directly from EXEC
30038 DISTCmdsDirectRepSend Number of commands directly sent from DIST with pre-packing data format.

Notes on DIST throughput counters
Individual counter notes

• UpdsRslocater
 unlike RepAgent User and SQM modules, there isn’t anything you can really do to tune this.
• Duplicates
 There are always duplicates during recovery. The trick is to determine when too many have gone through. One way
would be to use sqm_recover_seg and the number of cmds/block and sqm_page_size to estimate the rough number of
duplicates.
• DISTCmdsDirectRepRecv
 This can only be compared by looking at the RepAgent RACmdsDirectRepSend value
Derived counters
• Cmds/sec = CmdsRead/(sample duration in seconds)
• DirectRepPct = (DISTCmdsDirectRepRecv/RACmdsDirectRepSend *100.0)

DIST SRE counters
Commands encountered by a DIST thread for which no replication definition exists. Authors Note: For non-MSA applications, this is
30013 CmdsNoRepdef a sign that a table is marked that shouldn’t be and data is likely discarded. For MSA applications, this is key clue that likely updates
and deletes will be slower due to where clause construction and a repdef might help.
30016 SREcreate SRE creation requests performed by a DIST thread. This counter is incremented for each new SUB.
30017 SREdestroy SRE destroy requests performed by a DIST thread. This counter is incremented each time a new SUB is dropped.
SRE requests performed by a DIST thread to fetch a SRE object. This counter is incremented each time a DIST thread fetches an
30018 SREget
SRE object from SRE cache.
30019 SRErebuild SRE rebuild requests performed by a DIST thread.
30020 SREstmtsInsert Insert commands encountered by a DIST thread and resolved by SRE.
30021 SREstmtsUpdate Update commands encountered by a DIST thread and resolved by SRE.
30022 SREstmtsDelete Deletes commands encountered by a DIST thread and resolved by SRE.
DIST commands with no subscription resolution that are discarded by a DIST thread. This implies either there is no subscription or
the 'where' clause associated with the subscription does not result in row qualification. Author’s Note: If you see a lot of these (e.g.
30023 SREstmtsDiscard
10% or more of total), this is an indication that you might want to consider using sp_setreptable <tablename>, ‘never’ on tables you
aren’t replicating….although it also may point out a narrow subscription (e.g. where status=‘complete’)
30027 dist_stop_unsupported_cmd dist_stop_unsupported_cmd config parameter.
30033 DISTSreTime The amount of time taken by a Distributor to do sre resolve.

DIST time counters (these are the important ones!)
30028 DISTReadTime The amount of time taken by a Distributor to read a command from SQT cache.
30030 DISTParseTime The amount of time taken by a Distributor to parse commands read from SQT.
30033 DISTSreTime The amount of time taken by a Distributor to do sre resolve.
The amount of time taken by a Distributor to call TD delivering commands. Author’s Note: DISTTDDeliverTime includes
30035 DISTTDDeliverTime
DISTMDDeliverTime (cumulative).
30039 DISTTDPackTime The amount of time taken by a Distributor in packing of commands by TD.
30041 DISTMDDeliverTime The amount of time taken by a Distributor in delivering of messages by MD.
The amount of time taken by MD module of Distributor for processing the cmds - mainly _md_construct_routing and
30043 DISTMDProcessCmdTime
_md_pack_component.
30045 DISTMDSQMWriteMsgTime The amount of time taken by MD module of Distributor in queueing write messages for SQM thread.
30047 DISTMDWriteWaitsTime The amount of time taken by DIST's MD to wait for SQM writes.

DIST Feature Map

Req’d
Add route/intermediate routes Any Any Any >5 targets for same source
dist_direct_cache_read Any Any 15.5+ 
SQM command cache Any Any 15.7+ Direct cmd replication

DIST starting configuration in SRS

dist_cmd_direct_replicate off on on
dist_direct_cache_read off (n/a) on
dist_sqt_max_cache_size 0 256MB 256MB
md_sqm_write_request_limit 1048576 8388608 8388608
sts_cachesize 1000 10000 10000
sts_full_cache_rs_columns on? Off, unless not using table repdefs
sts_full_cache_rs_objects on? Off, unless not using table repdefs
sts_full_cache_rs_objfunctions on? Off, unless not using table repdefs

DIST Summary
Tune the STS cache

• Set sts_cache_size between 2000 and 5000
Enable dist_direct_cache_read
• On non-ASO, try to eliminate PIO by SQMR and shrink the sqt_init_read_delay as mentioned earlier so that
SQT isn’t sleeping as much or as long when it does.
Watch the RS MC DIST*TIME counters

• DISTReadTime
 Could be indication that either dist_direct_cache_read is off or SQT is rescanning large transaction that was removed.
• DISTParseTime
 If it goes high, watch the SQM command cache (Collisions and other issues)
• DISTTDDeliverTime
 For SRS 15.6 and previous, use this as if DISTMDDeliverTime as it includes it in time tracking
• DISTMDDeliverTime (15.7+)
 Slow disks or a lot of different targets
• DISTMDWriteWaitsTime (15.7+)
 You don’t want any waits here at ALL….it will backlog into the SQT cache

RSI & RSI User Processing
Routes and Performance
Remember this???
EXEC SQM
1 2
RSI User
3 IBQ
SQT 10
DIST 9 DIST
5 4
6 7 8 11
SQM OBQ RSI SQM OBQ RSI

12
SQM SQM
15
OBQ DSI-S DSIEXEC OBQ 13 DSI-S 14 DSIEXEC

Routing & Performance
Use a route any time a WAN is involved – no matter how short the WAN
• Using routes generally benefits performance and reliability in WAN environments
Consider a route in a LAN for performance and robustness if between buildings, etc.
• It also can alleviate contention between DSI and other threads on internal structures (e.g. STS cache)
If you want really low-latency in a LAN, a route may not help
• Direct command replication is not possible across a route
 RepAgent  DIST: direct command replication possible
 DIST  DSI: direct command replication possible
 RSI  network (RTL)  RSIU….direct command replication not possible due to network
• So, if source & target are in the same datacenter a route may add a bit of latency
• HOWEVER: This needs to be carefully considered
 If there is any latency at all in the DSI/DSIEXEC due to RDB execution speed, this argument fails quickly
 Due to inherent network speed issues, this likely only works best when source/target are on same subnet

An unfortunate, but all too common implementation
The Route Bottleneck

A bit better, but still…..

A better picture….MPR dedicated routes
…an order of magnitude less contention on queues

(PRS) RSI and (RRS) RSIUser monitor counters
4000 BytesSent Bytes delivered by an RSI sender thread.
4002 PacketsSent Packets sent by an RSI sender thread.
4004 MsgsSent RSI messages sent by an RSI thread. These messages contain the distribute command.
RSI get truncation messages sent by a RSI thread. This count is affected by the rsi_batch_size and
4005 MsgsGetTrunc
rsi_sync_interval configuration parameters.
Number of times that a RSI thread has been faded out due to inactivity. This count is influenced by the configuration
4006 FadeOuts
RSI
parameter rsi_fadeout_time.
Number of blocking (SQM_WAIT_C) reads performed by a RSI thread against SQM thread that manages a RSI
4007 BlockReads
queue.
4009 SendPTTime Time, in milli-seconds, spent in sending packets of data to the RRS.
4015 RSIReadSQMTime The time taken by an RSI thread to read messages from SQM.
4017 RSTicket rs_ticket markers processed by a RSI thread.
4018 UnpackedCmd Total commands unpacked by a RSI thread.
59000 RSIUCmdsRecv Commands received by a RSI User thread. Includes RSI message, get truncation requests, etc.
59001 RSIUMsgRecv RSI messages received by a RSI User.
59002 RSIUGetTRecv Get truncation requests received by an RSI User.
59003 RSIURebldQRecv Rebuild queues requests received by an RSI User.
59004 RSIUSetTRecv Set truncation requests received by an RSI User.
59005 RSIUCmdLen Length of an RSI command.
RSIUSER
59006 RSIUSendGetT The amount of time, in milli-seconds, spent responding to 'get truncation' requests.
59008 RSIUSendSetT The amount of time, in milli-seconds, spent responding to 'set truncation' requests.
59010 RSIURecvPckt The amount of time, in milli-seconds, spent receiving network packets.
Number of command buffers received by an RSI User thread. Buffers are broken into packets when in 'passthru'
59013 RSIUBuffsRcvd
mode, or language 'chunks' when not in 'passthru' mode. See counter 'RSIUPcktsRcvd' for these numbers.
Number of empty packets received in 'passthru' mode by an RSI User thread. These are 'forced' EOM's. See
59014 RSIUEmptyPckts
counter 'RSIUPcktsRcvd' for these numbers.
59015 RSIUConnPcktSz The connection packet size for the RSI User.
59016 RSIUBytsRcvd Bytes received by an RSI User thread. This size includes the TDS header size when in 'passthru' mode.
59017 RSIUExecTime The amount of time, in milli-seconds, RSI User thread is scheduled by OCS.
Route guidance is very simple
Don’t create a bottleneck with too many connections per route

• Too many source connections using same route will cause DIST write latency
• Use MPR and create multiple dedicated routes
Configure the route so that truncation synchronization is ~1 per 15-20 seconds
• The default configurations are way too low
 rsi_packet_size default is 4096  try 16384, especially if network has jumbo packets enabled
 rsi_batch_size default is 262144 (256KB)  minimum should be ideally ~8-32MB
– If processing 1000 1KB cmds/sec or 1MB/sec, it is sync’ing 4x per second at the default…..about 200x what it should be
– Even at 8MB, it is sync’ing every 8 seconds at 1000 cmds/sec….at 5K cmds/sec, it is still ~1/sec which is too low
– At 32MB (33554432), it is syncing every ~6 secs at 5000 cmds/sec….still pretty frequent…but……
– Max is 128MB
 Make sure the server md_sqm_write_request_limit is at least 8MB if not same as rsi_batch_size
– This parameter cannot be tuned for a route as with a connection, so setting the server limit causes it to be inherited
• This is far more than just an RSSD interaction issue
• Each RSI get_trunc()/set_trunc() results in RRS forcing the SQM block to flush
 SQM counter WritesForceFlush (6039)
 The block flush slows down the RRS RSIUser write speed

The unfortunate part
Currently, DIST counters are not recorded for RSIUser thread

• …of course, remember, we are only hitting the MD module….
• …but still, some of the MD write waits/times would be interesting
The analysis really is intended for one SRS at a time
• The first problem is rs_config doesn’t incorporate the SRS id….so you can’t have two sets of configs loaded
• The second problem is that many of the catalog tables are replicated
 Consequently you possibly could end up with duplicate rows (or dupe key errors) on loading
 A lot depends on whether routes are created in each direction
Soooo….the key focus is to load the counters for the SRS where the latency is
• ….and then look at how many truncation synchronizations per sec there are
 Sender or receiver, the effect would be the same

RSI starting configuration in SRS

rsi_batch_size 262144 16777216 16777216
rsi_fadeout_time -1 -1 -1
rsi_packet_size 4096 16384 16384 Leverage jumbo frames
rsi_sync_interval 60 60 60

Lab #3
SQMR, SQT & DIST
Lab 3: SQMR/SQT & DIST

SQMR Analysis
1. How effective is the page cache (cached block reads)?
2. Should we increase or decrease sqt_init_read_delay/sqt_max_read_delay?
SQT Cache Analysis

3. Were any large transactions removed?
4. Should we increase the dist_sqt_max_cache_size/sqt_max_cache_size?
5. Is SQT just buffering transactions due to DIST performance lagging?
DIST Analysis
6. Would dist_direct_cache_read help? How can you tell?
7. Are there any issues with repdefs
 e.g. are we lacking any???
 Is it possible that tables are marked for rep that shouldn’t be??
8. Is the MD write queue sized correctly??
9. How is the write speed to the outbound queue??

Outbound SQM Processing
The first stage in outbound processing
Outbound Queue SQM
Behaves identical to the inbound queue

• As such, the same counters and rules apply
Some operational differences
• Much more likely to have backlog due to DSI  RDB slowness
• As a result, much more likely to have physical reads from disk
 In otherwords, direct command replication is not as likely, although possible if there is no latency
A common bad behavior – monitoring via admin who, sqm

• It is “okay” if just trying to see if the queue is moving or how big the backlog is….
• But the common impression that if Next.Read > Last Seg.Block the queue is caught up is NOT quite correct
 It *is* true that the queue reader is caught up….
 ….but if the DSI has a large SQT cache, there could still be a huge backlog just in memory
• In other words – it is good for a quick adhoc check….
• ….but if you poll it every n minutes and send the output to someone for analysis, you are wasting both your
and their time.
 All it proves is that you had a backlog. Duh!!!

DSI Processing
Packaging & scheduling transactions for the replicate
Data Server Interface (DSI) modules (1)
DSI/SQMR
• Just like with SQT/SQMR, it reads transactions from the queue
DSI/SQT
• Just like with the DIST/SQT, it sorts transactions into commit sequence order
 Remember, a single destination may be the target of multiple sources....
 The transactions from different sources may be intermingled
 In addition, in the case of a WarmStandby DSI, the DSI SQT does the heavy lifting sort ala the SQT
• One difference from DIST/SQT is that the SQT cache is used for ….
 DSI transaction grouping
 DSI compilation for HVAR/RTL
 ….the CLOSED list is the source for both of these functions
• Another subtle difference is the transaction profiling
 Determines whether contiguous inserts in same transaction can be sent via bulk inserts
 Arguably, this is a module unto itself….but …..it is implemented within the core SQT module
– Note that the inbound queue DIST/SQT really doesn’t need this functionality, so, it skips it….mostly – except SQLDML
• Thirdly, the sqt_prs_cache_size is used to cache parsed commands for DSIHQ

Data Server Interface (DSI) modules (2)
DSIHQ
• This is the module that actually does the HVAR/RTL processing
• Compiles the net changes into a Compilation Database (CDB)
 The CDB is an in-memory collection of data – not an end-user database
 The compiled net changes are in the CDB
– With pointers back to the parsed commands in the SQT/PRS cache
DSI Scheduler (aka DSI-S)

• Does the transaction grouping and dispatch (especially when parallel DSI)
• Tells the DSIEXEC how to send the transaction
 Parallelism/Serialization coordination
 Bulk
 HVAR/RTL
• Controls begin/commit sequencing and rollback processing

The big picture of the DSI (remember this?)
SRE (cmd structure) (cmd structure) SQL

packing
Packed ascii cmd
Packed binary cmd
DIST Packed ascii cmd DSIEXEC
SQT
DSI
SQM page cache
Current block
SQM
Packed ascii cmd

DSI processing details (w/ ASO or RTL)
DSIHQ CDB
O C R T
DSI SQT Cache
Direct cache read for cmds
(ASO not required)
Txn dispatch msg

(not txn cmds)
DSI Message
DSI Queue
DSIEXEC(s)

Comments on DSI SQT Cache
Frequently oversized.
The problem is due to monitoring via admin who,sqm
• Admin commands only report that module statistics and not entire system
• RS course teach that admin who,sqm can measure latency via Last Seg.Block vs. Next Read.
 This was reasonably true when SRS didn’t use much memory for SQT cache as the amount of backlog cached was
minimal compared to disk space often in the queue due to latency
Problem is acerbated by adding more memory to SQT cache.

• When DSI is suspended to change the cache the SQT is cleared
• When restarted, even more transactions can be read into cache so the misinterpretation is that it is
processing faster
• Unless the transaction volume is consistently high, the admin will assume increase SQT cache resolved the
issue
• If transaction volume is consistently high, admin will notice backlog increasing…
 ….and assume since the first increase “helped”, that even more memory is needed…..
 ….or will get frustrated
One of the most difficult aspects is getting admins to stop relying on admin who,sqm
Example DSI SQT cache: The mythical full cache
Outbound DSI (Inbound WS-DSI) SQT Cache Memory

Destination Connection: SERVER.database (117)
Interval CmdsRead CmdsPerSec CmdMaxTran CmdAvgTran CacheMem

------------------------------ ---------- ---------- ---------- ---------- ----------
(1) 15:19:52 -> 15:36:06 212319 218.2 27 3 1024.00
(2) 15:36:07 -> 15:44:45 261447 503.7 25 3 1024.00
(3) 15:44:46 -> 15:53:20 294550 573.0 25 3 1024.00
(4) 15:53:21 -> 16:01:53 236404 461.7 25 3 1024.00
(5) 16:01:54 -> 16:10:20 429449 847.0 224 3 1024.00
(6) 16:10:21 -> 16:18:52 253626 496.3 25 3 1024.00
(7) 16:18:53 -> 16:27:18 370260 733.1 26 3 1024.00
(8) 16:27:19 -> 16:35:40 342405 683.4 25 3 1024.00
(9) 16:35:41 -> 16:44:03 414697 826.0 27 3 1024.00
(10) 16:44:04 -> 16:52:29 333716 660.8 25 3 1024.00
(11) 16:52:30 -> 17:00:50 324586 649.1 25 3 1024.00
(12) 17:00:51 -> 17:09:15 465746 924.0 25 3 1024.00
(13) 17:09:16 -> 17:17:38 199395 396.4 25 3 1024.00
(14) 17:17:39 -> 17:25:57 482443 966.8 28 3 1024.00
(15) 17:25:58 -> 17:34:18 371331 742.6 51 3 1024.00
------------------------------ ---------- ---------- ---------- ---------- ----------
4992374 645.4 224 3 1024.00
We have a 1GB DSI SQT cache and it is fully used – likely messages in errorlog about being full or admin
who,sqt will show filled=1 which may cause DBA to consider adding more cache…..but….

Example DSI SQT cache: Simply buffering due to slow RDB

Interval TxnRemoved OpenTrans ClosedTran ReadTrans TruncTrans

------------------------------ ---------- ---------- ---------- ---------- ----------
(1) 15:19:52 -> 15:36:06 0 0 29003 14736 45087
(2) 15:36:07 -> 15:44:45 0 1 25336 21381 44972
(3) 15:44:46 -> 15:53:20 0 0 28461 18266 47073
(4) 15:53:21 -> 16:01:53 0 0 28064 21291 50638
(5) 16:01:54 -> 16:10:20 0 0 25798 21916 47059
(6) 16:10:21 -> 16:18:52 0 0 29100 16534 46656
(7) 16:18:53 -> 16:27:18 0 0 29549 21620 51606
(8) 16:27:19 -> 16:35:40 0 0 28150 21671 49823
(9) 16:35:41 -> 16:44:03 0 1 28595 22436 51146
(10) 16:44:04 -> 16:52:29 0 1 30085 21560 51646
(11) 16:52:30 -> 17:00:50 0 1 28150 20976 48752
(12) 17:00:51 -> 17:09:15 0 0 30214 25685 55032
(13) 17:09:16 -> 17:17:38 0 0 36780 20848 57629
(14) 17:17:39 -> 17:25:57 0 0 31630 24724 55750
(15) 17:25:58 -> 17:34:18 0 0 32042 25999 58042
------------------------------ ---------- ---------- ---------- ---------- ----------
0 4 440957 319643 760911
…(cont) In reality, we are simply buffering transactions in the DSI SQT cache because the DSIEXEC can’t
replicate them any faster to the RDB – most likely due to slow RDB execution. In this case ReadTrans is high
due to the large transaction groups used by HVAR
DSI Transaction grouping (w/ DSIHQ enabled)

Interval ReadGroups ReadUngrpd GroupsSent UngrpdSent XactsInGrp GrpsCommit UngrpdCmt

------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
(1) 15:19:52 -> 15:36:06 4 71461 4 71461 17865.2 4 64917
(2) 15:36:07 -> 15:44:45 2 40739 2 40739 20369.5 3 83873
(3) 15:44:46 -> 15:53:20 3 59348 3 59348 19782.6 3 59348
(4) 15:53:21 -> 16:01:53 3 59377 3 59377 19792.3 3 60433
(5) 16:01:54 -> 16:10:20 7 112265 7 112265 16037.8 7 114418
(6) 16:10:21 -> 16:18:52 3 50037 3 50037 16679.0 3 83672
(7) 16:18:53 -> 16:27:18 4 85042 4 85042 21260.5 4 85736
(8) 16:27:19 -> 16:35:40 6 119391 6 119391 19898.5 5 80620
(9) 16:35:41 -> 16:44:03 5 92913 5 92913 18582.6 5 90028
(10) 16:44:04 -> 16:52:29 4 86612 4 86612 21653.0 4 88768
(11) 16:52:30 -> 17:00:50 4 90964 4 90964 22741.0 5 121689
(12) 17:00:51 -> 17:09:15 7 124864 7 124864 17837.7 4 88693
(13) 17:09:16 -> 17:17:38 10 87175 10 87175 8717.5 9 52953
(14) 17:17:39 -> 17:25:57 6 134587 6 134587 22431.1 6 144067
(15) 17:25:58 -> 17:34:18 4 84321 4 84321 21080.2 4 84182
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
72 1299096 72 1299096 18981.9 69 1303397
By comparing the various ungrouped vs. grouped transaction counters, we can derive an effective
dsi_max_xacts_in_group setting. Normally, it would be about 20. However, with HVAR active we can easily
bypass this restriction. All is not well as the commits vs. sent are out of whack suggesting retries or other issues
DSI transaction profiler
Determines how transaction/command can be applied

• Default is SQL language
To do so, it has to parse the commands (whereas DIST/SQT doesn’t)
• Hence the notion of the sqt_prs_cache_size
Other options:
• If Dynamic SQL (DSQL) is allowed or not
 can be disabled in repdef or due to nulls in pkey/where clause, etc.
• If bulk inserts will be used (contiguous inserts in same transaction)
• If commands can be compiled
• Whether SQLDML will be used (inbound queue)
No real tuning….just be aware that it exists
• However, some of the DSI counters will signal why options above were not used when it was expected that
they should have been

DSI Scheduler: Transaction grouping
Transaction grouping is for efficiency
• Think about the overhead of rs_lastcommit with atomic transactions without grouping
• There also is overhead in sequencing via the DSI Message Queue
There is a long list of rules in the docs…

But let’s put it the simple way:
• Only CLOSED transactions can be grouped
• Only transactions from the same source can be grouped
 This is because interface to rs_lastcommit only accepts one DBID (e.g. it is a proc in ASE)
• Only transactions fully in cache can be grouped
• Any “large” transaction (>dsi_large_xact_size) will go in its own group (cached or not)
Obviously we need to limit the group size

• For non-compiled databases, we use dsi_max_xacts_in_group and dsi_xact_group_size
 Which ever is tripped first works….
 The default dsi_max_xacts_in_group is 20….probably a good setting
 The default dsi_xact_group_size is 64K – too low – set to 4MB or so…..
– You might be tempted to set to 2B ….as often is suggested….but…..
– You then end up grouping multiple large transactions (assuming they are in cache)
• Compiled databases use the CDB limits instead (dsi_cdb_max_size, etc.)

Transaction grouping & performance
Not an overall major player unless under-grouping

• Use the counters to observe the number of ungrouped vs. transaction groups sent to get an
effective/computed dsi_max_xacts_in_group
• If near the config (95%+), unless the config is messed up, tuning this more likely won’t help
The key is to look at the DSI group closure counters and see if….
• Closed prematurely due to silly default configs
 Obvious fix – increase the configuration setting
• Closed prematurely due to interspersed transactions from multiple sources
 If this is the case, consider using Multiple DSI’s – one for each source

DSI Message Queue
Not quite what you think

• It is not used for sending the replicated commands to the DSIEXEC
• The DSIEXEC does a direct cache read from the DSI SQT cache
Functions of the DSI Message Queue
• Transaction dispatch messages to DSIEXECs
 The contents of the transaction group next to be executed by the next available DSIEXEC
• Transaction begin/commit sequencing messages
 DSI sends begin messages to each DSIEXEC
 Each DSIEXEC replies when transaction began
 To enforce DSIEXEC serialization, the begins are either sent all at once (none) or one after the other when previous
begin executed (wait_for_start)
 Transaction groups that can be sent in parallel are then written to queue at once (no waiting)
– Note this is just the transaction group (e.g. list of transaction oqid’s) and not the contents
 When each DSIEXEC is ready to commit, it sends thread ready message back to DSI
 DSI sends commit message to each DSIEXEC via the queue
 Serial transactions are simply held until next sequence
• Still, quite a bit of chatter….so we have counters to monitor
 Although rarely ever a problem

DSI throughput counters
5000 DSIReadTranGroups Transaction groups read by the DSI. If grouping is disabled, grouped and ungrouped transaction counts are the same.
Ungrouped transactions read by the DSI. If grouping is disabled, grouped and ungrouped transaction counts are the
5002 DSIReadTransUngrouped
same.
Transactions determined to be duplicates by a DSI thread. Typically, some transactions are ignored at replication
5003 DSIReadTransIgnored
server's startup time because the outbound queue being scanned to locate the next active command.
5005 DSIReadTransSkipped Transactions skipped by resuming a connection with 'skip transaction' clause.
Transaction groups applied successfully to a target database by a DSI thread. This includes transactions that were
5007 DSITranGroupsSucceeded
successfully committed or rolled back according to their final disposition.
Grouped transactions failed by a DSI thread. Depending on error mapping, some transactions may be written into the
5009 DSITransFailed
exceptions log.
5011 DSITransRetried Grouped transactions retried to a target server by a DSI thread.
When a command fails due to data server errors, the DSI thread performs post-processing for the failed command. This
5019 DSIAttemptsTranRetry counter records the number of retry attempts. Authors Note: This one is most useful as a tran may be retried and
successful (especially with RTL).
Transaction groups sent to the target by a DSI thread. A transaction group can contain at most dsi_max_xacts_in_group
5020 DSITranGroupsSent
transactions. This counter is incremented each time a 'begin' for a grouped transaction is executed.
5022 DSITransUngroupedSent Transactions contained in transaction groups sent by a DSI thread.
5024 DSITranGroupsCommit Transactions committed successfully by a DSI thread.
5026 DSITransUngroupedCommit Transactions in groups sent by a DSI thread that committed successfully.
5028 DSICmdsSucceed Commands successfully applied to the target database by a DSI.
5030 DSICmdsRead Commands read from an outbound queue by a DSI.
5032 DSICmdsParsedBySQT Commands parsed by an SQT thread before being read by a DSI queue.
5046 TextBytes Bytes written by rs_writetext commands. This count encompasses all DSI executor threads.

Some comments on throughput counters
Derived counters:
• Dsi_rate = DSICmdsRead/<seconds>
• Effective dsi_max_xacts_in_group = DSITransUngroupedSent/DSITranGroupsSent
Comments
• DSICmdsRead refers to the DSI-Scheduler aspect
 The other modules (such as SQMR and SQT) have counters for CmdsRead as well
 SQMR & SQT CmdsRead maybe higher until the SQT Cache is full
 DSICmdsRead may be slightly artifically high at first due to DSIEXEC batching, but should quickly drop down to same
rate as DSIEXEC processing
 At stable point, DSICmdsRead is the effective throughput of the DSI/DSIEXEC pipeline – which is essentially and most
often controlled by the speed of the replicate database in processing transactions or network overhead.
• The READ, SENT, Succeeded, Committed DSITransGroup differences
 READ refers to the initial grouping as constructed by the DSI-S
 SENT could include the retries due to failures
 Succeeded is groups that finished but didn’t FAIL (includes retries)
 COMMIT is those actually committed without a retry
 Observations:
– READ is likely higher than COMMIT due to some still in cache (not yet sent)
– SENT = Committed + DSIAttemptsTranRetry + DSITransFailed

DSI connection setup/last commit counters
5034 TransWSBIgnored Transactions ignored by a DSI thread in a warm standby configuration when switchover is performed.
Invocations of function rs_getlastcommit by a DSI thread. This function is executed each time DSI a thread is started, and
5036 ExecsGetLastCommit
each time the thread is suspended and resumed.
Executions of rs_update_lastcommit by a DSI. Note: These are explicit execs of rs_update_lastcommit. Implicitly the
5037 ExecsUpdLastCommit lastcommit table is updated when a trxn is committed. See cntr 'CommitsRead' for each DSI/E for a count of these implicit
updates.
Invocations of function rs_get_sortorder. This function is executed each time a DSI thread is started, and each time the
5038 ExecsGetSortOrder
thread is suspended and resumed.
Invocations of function rs_get_charset. This function is executed each time a DSI thread is started, and each time the thread
5039 ExecsGetCharSet
is suspended and resumed.
Invocations of rs_initialize_threads. This function is executed each time a DSI thread is started, and each time the thread is
5040 ExecsInitThread
suspended and resumed.
5041 ExecsRsMarkers Invocations of function rs_marker.
5078 RSTicket rs_ticket markers processed by a DSI QM.
5117 DSIResyncSkippedTrans Transactions Skipped by resuming a connection with 'skip to resync marker' clause.
These are mainly ignorable except for certain debugging situations – listed here
mainly for completeness

Non-DSIHQ Group closure counters

5042 GroupsClosedBytes Transaction groups closed by a DSI thread due to the next tran causing it to exceed dsi_xact_group_size.
Trxn groups closed by a DSI due to no open group from the origin of the next trxn. I.e. We have a new origin in the next
5043 GroupsClosedNoneOrig
trxn, or the Sched forced a flush of the current group from the origin leaving no open group from that origin.
Asynchronous stored procedure transaction groups closed by a DSI thread due to the next tran user ID or password being
5044 GroupsClosedMixedUser
different from the ones for the current group.
Transaction groups closed by a DSI thread because the current group contains asynchronous stored procedures and the
5045 GroupsClosedMixedMode
next tran does not or the current group does *not* contain asynchronous stored procedures and the next transaction does.
5063 GroupsClosedTrans Transaction groups closed by a DSI thread due to the next tran causing it to exceed dsi_max_xacts_in_group.
5068 GroupsClosedLarge Transaction groups closed by a DSI thread due to the next transaction satisfying the criteria of being large.
Transaction groups closed by a DSI thread for a Warm Standby due to the next transaction being special - empty, or a
5069 GroupsClosedWSBSpec
enable replication marker or subscription materialization marker or ignored due to duplication detection, etc.
Transaction groups closed by a DSI thread due to the next transaction following the execution of the 'resume' command -
5070 GroupsClosedResume
whether 'skip', 'display' or execute option chosen.
Transaction groups closed by a DSI thread due to the next transaction being qualified as special - orphan, rollback,
5071 GroupsClosedSpecial
marker, duplicate, ddl, etc.
5049 GroupsClosedTranPartRule Transaction groups closed by a DSI thread because of a Transaction Partitioning rule.

Comments on group closure reasons
GroupsClosedBytes & GroupsCloseLarge are config problems

• For GroupsClosedBytes, check dsi_xact_group_size
 The default is 65536 (64KB). A better configuration is 8MB
 While some may set this to 2B, this could allow multiple large transactions to be grouped….
• For GroupsClosedLarge, check dsi_large_xact_size
 The default is 100. A better configuration is 10000+ - depends on how large of a group you want to force closing the
previous group at….
GroupsClosedTrans is normal
• Except for DSIHQ
GroupsCloseNoneOrig can be a bad thing
• If low volume, this just indicates that the previous group was already sent and there weren’t any groups open
at this time to append to…..this is okay.
• If mid-high volume, this likely indicates that the previous transaction was from a different source and MDSI
likely should be used to reduce/eliminate latency
 For DSIHQ, verify with HQ counters

Parallel DSI counters
5049 GroupsClosedTranPartRule Transaction groups closed by a DSI thread because of a Transaction Partitioning rule.
5050 PartitioningWaits Transaction groups forced to wait for another group to complete (processed serially based on Transaction Partitioning rule).
5051 UserRuleMatchGroup Times Transaction Partitioning rule USER was checked and found to be 'parallel' for GROUPING decision.
5052 UserRuleMatchDist Times Transaction Partitioning rule USER was checked and found to be 'serial' for DISTRIBUTION decision.
5053 TimeRuleMatchGroup Times Transaction Partitioning rule TIME was checked and found to be 'parallel' for GROUPING decision.
5054 TimeRuleMatchDist Times Transaction Partitioning rule TIME was checked and found to be 'serial' for DISTRIBUTION decision.
5055 NameRuleMatchGroup Times Transaction Partitioning rule NAME was checked and found to be 'parallel' for GROUPING decision.
5056 NameRuleMatchDist Times Transaction Partitioning rule NAME was checked and found to be 'serial' for DISTRIBUTION decision.
5057 AllThreadsInUse This counter is incremented each time a Parallel Transaction must wait because there are no available parallel DSI threads.
5058 AllLargeThreadsInUse This counter is incremented each time a Large Parallel Transaction must wait because there are no available parallel DSI threads.
5059 ExecsCheckThrdLock Invocations of rs_dsi_check_thread_lock by a DSI thread. This function checks for locks held by a transaction that may cause a deadlock.
5060 TrueCheckThrdLock Number of rs_dsi_check_thread_lock invocations returning true. The function determined the calling thread holds locks required by other threads. A rollback and retry occurred.
Number of times transactions exceeded the maximum allowed executions of rs_dsi_check_thread_lock specified by parameter dsi_commit_check_locks_max. A rollback
5062 CommitChecksExceeded occurred.
5064 CmdGroupsRollback Command groups rolled back successfully by a DSI thread.
5066 RollbacksInCmdGroup Transactions in groups sent by a DSI thread that rolled back successfully.
5072 OriginRuleMatchGroup Times Transaction Partitioning rule ORIGIN was checked and found to be 'parallel' for GROUPING decision.
5073 OriginRuleMatchDist Times Transaction Partitioning rule ORIGIN was checked and found to be 'serial' for DISTRIBUTION decision.
5074 OSessIDRuleMatchGroup Times Transaction Partitioning rule ORIGIN_SESSID was checked and found to be 'parallel' for GROUPING decision.
5075 OSessIDRuleMatchDist Times Transaction Partitioning rule ORIGIN_SESSID was checked and found to be 'serial' for DISTRIBUTION decision.
5076 IgOrigRuleMatchGroup Times Transaction Partitioning rule IGNORE_ORIGIN was checked and found to be 'parallel' for GROUPING decision.
5077 IgOrigRuleMatchDist Times Transaction Partitioning rule IGNORE_ORIGIN was checked and found to be 'parallel' for DISTRIBUTION decision.
5087 DSIPutToSleep Number of DSI/E threads put to sleep by the DSI/S prior to loading SQT cache. These DSI/E threads have just completed their transaction.
5088 DSIPutToSleepTime Time spent by the DSI/S putting free DSI/E threads to sleep.
5092 DSIThrdRdyMsg ''Thread Ready'' messages received by a DSI/S thread from its assocaited DSI/E threads.
5093 DSIThrdCmmtMsgTime Time spent by the DSI/S handling a ''Thread Commit'' message from its associated DSI/E threads.
5095 DSIThrdSRlbkMsgTime Time spent by the DSI/S handling a ''Thread Single Rollback'' message from its associated DSI/E threads.
5097 DSIThrdRlbkMsgTime Time spent by the DSI/S handling a ''Thread Rollback'' message from its associated DSI/E threads.

Transaction Profile (TPF) counters
5099 DSINoDsqlNULL Number of commands that cannot use dynamic SQL statements because of NULL value in where clauses.
5100 DSINoDsqlDatatype Number of commands that cannot use dynamic SQL statements because of TEXT, IMAGE, JAVA and ineligible UDDs.
5101 DSINoDsqlRepdef Number of commands excluded from dynamic SQL by replication definition.
5102 DSINoDsqlColumnCount Number of commands excluded from dynamic SQL because the number of parameters would exceed 255.
5104 DSINoBulkDatatype Number of bulk operations skipped because the tables have datatypes incompatible with bulk.
5105 DSINoBulkFstr Number of bulk operations skipped because the tables have customized function strings for rs_insert or rs_writetext.
5106 DSINoBulkAutoc Number of bulk operations skipped because the tables have autocorrection turned on.
Number of commands excluded from dynamic SQL because minimal columns is on for the update and at least some columns of the table are
5107 DSINoDsqlMinColNoRepdef
not in repdef.
5108 DSIPendingTimeOut Number of times DSI timed out waiting for the next batch of commands while previous batch results are pending.
There are 4 different transaction profiles to consider other than normal

• Whether DSQL should be used or not
 Shaded blue in the above
• Whether dsi_bulk_copy should be used for contiguous inserts in same transaction
 Shaded green in the above
• Whether DSIHQ should be used (ASO option)
 Separate counters not illustrated
• Whether DSI Command Prefetch could be used (ASO option)
 Shaded yellow in the above….we will discuss this in DSIEXEC section
 This will help if DSIEXEC batch time is fairly significant as DSIEXEC won’t have to wait for DSI-S to come up with the next group….it will be ready and waiting
 Definitely consider if DSQL is enabled, or if you think multiple dsi_bulk_inserts in a row exist.

DSI Time counters

5079 DSIFindRGrpTime Time spent by the DSI/S finding a group to dispatch.
5081 DSIPrcSpclTime Time spent by the DSI/S determining if a transaction is special, and executing it if it is.
5083 DSIDisptchRegTime Time spent by the DSI/S dispatching a regular transaction group to a DSI/E.
Time spent by the DSI/S dispatching a large transaction group to a DSI/E. This includes time spent finding a large group to
5085 DSIDisptchLrgTime
dispatch.
5090 DSILoadCacheTime Time spent by the DSI/S loading SQT cache.
5108 DSIPendingTimeOut Number of times DSI timed out waiting for the next batch of commands while previous batch results are pending.
5109 DSIMsgQWriteWait Message Queue Write Wait Time
5111 DSIMsgQReadWait Message Queue Read Wait Time
5113 DSIYieldTime DSI YIELD Time
5115 DSISqmMsgQWait SQM notify Message Read Wait Time
5116 DSIDSIeMsgQWait DSIe Message Read Wait Time
5093 DSIThrdCmmtMsgTime Time spent by the DSI/S handling a ''Thread Commit'' message from its associated DSI/E threads.
Not too much here other than debugging

• Typically, most of the time will be spent in
DSIMsgQReadWait (due to waiting for response from RDB)
 DSISqmMsgQWait (due to latency driving phys reads in order to load cache)
• Some time will be spent in
DSILoadCacheTime….again due SQM physical reads…likely will be 10-20% of the above
DSIThrdCmmtMsgTime….doing cleanup after a commit…likely will be 3-5% of the above
• DSIYieldTime should be very small (milliseconds)
DSIHQ: Implementing HVAR and RTL/IQ
SAP Sybase ASE (RS 15.5+)
SAP HANA (RS 15.7.1 sp100+)

3 Bulk Merge into
Production Tables SAP Sybase IQ (RS/RTL 15.5+)
(RS 15.7.1+)
(updates, deletes)
(inserts)
1 In-Memory
Consolidated Net 2 Bulk Load of Net
Changes Changes
Prod Tables
Temp Tables

DSIHQ processing phases
CDB flush triggering events

• DSI transaction group closure
• CDB limit (size, cmds, sqt cache size)
• DSIEXEC ready
CDB creation Execution failures/retries

• New CDB per DSI txn group • Transaction group to first 1/3rd txn size
• 1 for each DSIEXEC thread • CDB recompiled on 1/3rd txn size
• If parallel DSI used (not recommended) • Pre-Execution on 1/3rd txn size
• …or MPR alternate connections (recommended) • If below retry threshold, exec will be language
• If new size < bulk threshold, exec will be language
• Smaller txn group executed
• If it succeeds, next 1/3rd attempted
• If it fails, split in 3rds, again and reattempted
DSIHQ throughput counters

67000 HQCompiled Number of transactions compiled.
67001 HQCompileError Number of transaction compiler errors.
67002 HQRuleViolation Number of Compiler rule violations.
67003 HQCmdsCompiled Number of commands compiled.
67004 HQCmdsFstrNotCompiled Number of commands not compiled due to function string restriction.
67005 HQCmdsTblNotCompiled Number of commands not compiled due to table is not configured for compilation.
67006 HQCmdsReduced Number of commands being reduced as result of compilation.
67007 HQInsCompiled Number of INSERT commands compiled.
67008 HQUpdCompiled Number of UPDATE commands compiled.
67009 HQDelCompiled Number of DELETE commands compiled.
67010 HQWTextCompiled Number of WRITETEXT commands compiled.
67011 HQDTextCompiled Number of DATAROW_FOR_WRITETEXT commands compiled.
67012 HQInsReduced Number of INSERT commands reduced.
67013 HQUpdReduced Number of UPDATE commands reduced.
67014 HQDelReduced Number of DELETE commands reduced.
67015 HQLangCmds Number of SQL language commands generated.
67034 HQ_STAGED_BATCH The number of command batches containing staged operations.
67035 HQ_STAGED_INS The number of staged INSERT operations in a command batch.
67036 HQ_STAGED_UPD The number of staged UPDATE operations in a command batch.
67037 HQ_STAGED_DEL The number of staged DELETE operations in a command batch.

Comments on DSIHQ throughput counters
Derived counters
• Compile % = (HQCmdsCompiled *100.0)/DSICmdsRead
• Reduced % = (HQCmdsReduced*100.0)/ HQCmdsCompiled
• Language % = (HQLangCmds*100.0)/ HQCmdsCompiled
Comments
• Typically, unless disabled for key tables, Compile% will be very high (~100%)
• Usually, Reduced% will be very low – most often <<10% and frequently <2%
 So don’t expect huge gains here except on certain key tables (sequential key tables, queues, etc.)
• Ideally, Language% should be very low – hopefully <2% and better if <1%
 If there is a constant dribble of language commands, the time spent processing these in the RDB may exceed the bulk
commands….the “constant dribble of death”
 Large numbers of language commands suggest dsi_bulk_threshold too high or a lot of tables with small numbers of
changes.
 Note that language commands are truly sent via SQL language text – DSQL is not used even if enabled
– Rationale is that one of the most common failures is due to datatype translation in bulk/DSQL vs. implicit with SQL language
• Otherwise mainly focus on compilation failure reasons, bulk sizes, execution time and retries.
 Trying to improve the command reduction is a waste of time and effort
DSIHQ transaction closure and noncompilation reasons
Description is too long. Authors Note: This is actually the description in the RSSD. The real definition is that the
5118 HQGroupsClosedNoneOrig
transactions are from different origins and can’t be grouped and MDSI should be used to avoid the problem.
HQ Transaction groups closed by a DSI thread due to the next transaction satisfying the criteria of being large
5119 HQGroupsClosedCmd (number of cmds). Authors Note: This cryptic description simply means that the number of commands in the cdb
would have exceeded dsi_compile_max_cmds. If less than 50,000, consider increasing.
HQ Transaction groups closed by a DSI thread due to the next transaction satisfying the criteria of being large
5120 HQGroupsClosedSize (CDB size). Authors Note: This cryptic description simply means that the transaction data size would have
exceeded dsi_cdb_max_size. If less than 2048 (2GB), consider increasing.
Number of transactions declared non-compilable by transaction profiling processing. Transactions contain other
5121 TranNonHQForTPF
commands than insert, update or delete ones.
Number of transactions declared non-compilable because their size is unknown and incremental compilation
5122 TranNonHQForNoSize
mechanism is deactivated.
Number of transactions declared non-compilable because their size is too large and incremental compilation
5123 TranNonHQForTooBig
mechanism is deactivated.
HQ Transaction groups closed by a DSI thread due to the limitation of the max SQT cache size. Authors Note:
5124 HQGroupsClosedSQTSize
DSI SQT cache size kept us from compiling more….should be increased.
Transaction groups closed by a DSI thread due to the limitation of the max SQT cache size. Authors Note: Non
5125 GroupsClosedSQTSize
DSIHQ, but similar – e.g. 20 10MB txns and only 128MB in DSI SQT cache.
Transaction groups closed by a DSI thread because there is a free DSI/E ready to accept group. Authors Note:
5126 GroupsClosedDispatch This is how RS avoids excess latency with HVAR…if DSIEXEC ready, it sends it rather than waiting forever to hit
one of the limits.
Transaction groups closed by a DSI thread due to DSI/S switch to different SQT cache. Authors Note: happens
5127 GroupsClosedDispatch
on materialization, DDL, large transaction, or system transaction
5128 GroupsClosedDispatch Transaction groups closed by a DSI thread due to rs_update_lastcommit command in the group.

DSIHQ Transaction group closures

Interval HQNoneOrig HQCDBCmd HQCDBSize HQSQTSize SQTSize Dispatch

------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
(1) 15:19:52 -> 15:36:06 3 0 0 0 0 2
(2) 15:36:07 -> 15:44:45 2 0 0 0 0 1
(3) 15:44:46 -> 15:53:20 3 0 0 0 0 2
(4) 15:53:21 -> 16:01:53 4 0 0 0 0 2
(5) 16:01:54 -> 16:10:20 3 0 0 0 0 2
(6) 16:10:21 -> 16:18:52 2 0 0 0 0 1
(7) 16:18:53 -> 16:27:18 3 0 0 0 0 2
(8) 16:27:19 -> 16:35:40 4 0 0 0 0 3
(9) 16:35:41 -> 16:44:03 4 0 0 0 0 3
(10) 16:44:04 -> 16:52:29 3 0 0 0 0 2
(11) 16:52:30 -> 17:00:50 3 0 0 0 0 2
(12) 17:00:51 -> 17:09:15 5 0 0 0 0 3
(13) 17:09:16 -> 17:17:38 1 0 0 0 0 1
(14) 17:17:39 -> 17:25:57 5 0 0 0 0 3
(15) 17:25:58 -> 17:34:18 4 0 0 0 0 2
------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
49 0 0 0 0 31
We are well below the dsi_cdb_max_size and dsi_compile_max_commands ….and DSI SQT cache is not a limiting
factor. However, we did have ~18 transactions from other source or proc – we can determine when HQNoneOrig is
tripped by another source/proc vs. by simply none open from same source by looking at Dispatch
DSIHQ time & bulk size counters

67000 HQCompiled Number of transactions compiled.
67003 HQCmdsCompiled Number of commands compiled.
67015 HQLangCmds Number of SQL language commands generated.
67016 HQCompileTime Time spent, in milli-seconds, to compile commands. Authors Note: (building CDB)
67018 HQPreExecTime Time spent, in milli-seconds, to prepare for bulk. Authors Note: (creating #temp, bitmask for updates)
67020 HQExecTime Time spent, in milli-seconds, in bulk execution. Authors Note: (insert/location & merge)
67022 HQPostExecTime Time spent, in milli-seconds, post bulk execution. Authors Note: (cleanup #temp, commit, etc.)
Time spent, in milli-seconds, in select thread. Authors Note: (result set materialization for insert/location….loosely translates to
67024 HQSelectTime
insert/location time so rest is merge in HQExecTime)
67026 HQBulk100 Number of bulk execution less than 100 commands.
67027 HQBulk100_500 Number of bulk execution 100-500 commands.
67028 HQBulk500_1K Number of bulk execution 500-1K commands.
67029 HQBulk1K_5K Number of bulk execution 1000-5000 commands.
67030 HQBulk5K_10K Number of bulk execution 5000-10000 commands.
67031 HQBulk10K_UP Number of bulk execution > 10000 commands.
67032 HQXSP_TIME The amount of time in milli-seconds spent executing transformation stored procedures when staging operations.

DSIHQ time & bulk counters
Time is the key

• HQExecTime will likely be largest
 But you don’t want to see it go to 0 unless nothing going on.
 0 for a sample when not idle means previous batch still executing
– Clock is reset with the counter and not restarted – hence the 0 (reported as a bug)
 If latency and this is high, look in iqmsg log for which tables time is spent on (for RTL/IQ)
• HQPreExecTime could have some time
 Most likely due to computing the bitmaps for updates, etc.
Bulk array counters (aka bulk matrix)

• Ideally, the larger the bulk size the better – so we want stuff in the 5K & 10+ range
• However, there may not be that many rows for some tables
 So there will always be the dribs & drabs in the <100 and 500 range
 This is fine as long as we are hitting the CDB configs (dsi_cdb_max_size, etc.)
• If a lot in the middle two ranges….
 If hitting a max on the CDB config, increase the CDB configs
 If terminating group early, see if you can affect larger groups
– Increase SQT cache (if txn removed), use incremental compilation, use MDSI to avoid NoneOrig, etc.

A typical time distribution
DSIHQ (RTL/HVAR) Processing Time (time in secs)
Destination Connection: iqserver.iqdb_e (117)
Interval CompileTm PreExecTm ExecTime PostExecTm SelectTime HQXSP_Time

------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
(1) 11:31:17 -> 11:46:48 14.273 85.729 252.691 0.010 4.609 0.000
(2) 11:46:49 -> 11:55:04 16.570 90.098 283.793 0.006 4.732 0.000
(3) 11:55:05 -> 12:03:24 18.461 91.059 271.152 0.007 4.345 0.000
(4) 12:03:25 -> 12:11:42 19.194 77.889 263.751 0.000 4.327 0.000
(5) 12:11:43 -> 12:20:00 18.119 89.260 277.143 0.000 4.690 0.000
(6) 12:20:01 -> 12:28:23 14.778 95.922 279.077 0.000 3.586 0.000
(7) 12:28:24 -> 12:36:40 16.407 90.430 274.017 0.014 3.290 0.000
(8) 12:36:41 -> 12:44:58 15.245 87.799 268.020 0.029 4.096 0.000
(9) 12:44:59 -> 12:53:15 15.200 86.083 291.273 0.000 3.704 0.000
(10) 12:53:16 -> 13:01:30 15.410 100.113 263.456 0.005 2.653 0.000
(11) 13:01:31 -> 13:09:46 19.825 90.578 245.152 0.012 3.662 0.000
(12) 13:09:47 -> 13:18:01 14.269 89.596 270.336 0.005 3.045 0.000
(13) 13:18:02 -> 13:26:18 14.676 97.842 280.569 0.017 2.561 0.000
(14) 13:26:19 -> 13:34:35 11.530 109.085 279.396 0.011 2.075 0.000
(15) 13:34:36 -> 13:42:51 12.609 110.132 266.856 0.023 2.055 0.000
------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
236.566 1391.615 4066.682 0.139 53.430 0.000
Up to 30% of ExecTime Negligible to Minimal (seconds) Only if staging procs

Bulk of time

The bulk counter matrix
DSIHQ (RTL/HVAR) Bulk Apply Size Matrix
Interval LangCmds Bulk <100 100 -> 500 500 -> 1K 1K -> 5K 5K -> 10K 10K -> UP
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
(1) 11:31:17 -> 11:46:48 129 3391 37066 5273 50818 0 0
(2) 11:46:49 -> 11:55:04 146 4034 45489 3858 59357 0 0
(3) 11:55:05 -> 12:03:24 137 4120 41503 5254 54169 0 0
(4) 12:03:25 -> 12:11:42 112 3153 39600 8154 57050 0 0
(5) 12:11:43 -> 12:20:00 168 3839 39566 10133 60283 0 0
(6) 12:20:01 -> 12:28:23 160 3791 44468 0 54929 0 0
(7) 12:28:24 -> 12:36:40 146 3963 43336 1186 54457 0 0
(8) 12:36:41 -> 12:44:58 157 3320 37769 4271 50456 0 0
(9) 12:44:59 -> 12:53:15 184 3694 39956 1701 50981 0 0
(10) 12:53:16 -> 13:01:30 140 5284 40347 589 52295 0 0
(11) 13:01:31 -> 13:09:46 170 3695 41939 2726 52648 0 0
(12) 13:09:47 -> 13:18:01 186 4484 39323 573 49764 0 0
(13) 13:18:02 -> 13:26:18 176 6639 38132 4770 46230 0 0
(14) 13:26:19 -> 13:34:35 219 8369 36225 8260 42244 0 0
(15) 13:34:36 -> 13:42:51 193 8053 35747 10058 39641 0 0
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
2423 69829 600466 66806 775322 0 0
This is starting to creep up – keep a

watch on it… dsi_bulk_threshold=20…
we can check ACTOBJ stats to see if
NOT SO GOOD ~OKAY GOOD
there are tables close to that number But could just be due to the small
number of rows modified

The reason why
DSIHQ (RTL/HVAR) Transaction Grouping
Interval Xacts/Grp HQNoneOrig HQCmd HQSize HQSQTSize SQTSize SwitchSQT

------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
(1) 11:31:17 -> 11:46:48 4244.2 26 0 0 0 0 0
(2) 11:46:49 -> 11:55:04 3948.2 32 0 0 0 0 0
(3) 11:55:05 -> 12:03:24 3776.0 30 0 0 0 0 0
(4) 12:03:25 -> 12:11:42 4818.5 27 0 0 0 0 0
(5) 12:11:43 -> 12:20:00 4369.7 31 0 0 0 0 0
(6) 12:20:01 -> 12:28:23 3450.5 34 0 0 0 0 0
(7) 12:28:24 -> 12:36:40 3572.0 34 0 0 0 0 0
(8) 12:36:41 -> 12:44:58 4201.3 27 0 0 0 0 0
(9) 12:44:59 -> 12:53:15 3564.1 31 0 0 0 0 0
(10) 12:53:16 -> 13:01:30 3093.1 38 0 0 0 0 0
(11) 13:01:31 -> 13:09:46 4322.5 28 0 0 0 0 0
(12) 13:09:47 -> 13:18:01 3321.8 34 0 0 0 0 0
(13) 13:18:02 -> 13:26:18 2922.7 39 0 0 0 0 0
(14) 13:26:19 -> 13:34:35 2578.4 43 0 0 0 0 0
(15) 13:34:36 -> 13:42:51 2523.1 44 0 0 0 0 0
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
54706.1 498 0 0 0 0 0
We aren’t hitting any config limits and all the group closures are due to NoneOrig……we can verify all from same source,
but if we look at SQMR counters in this case, we notice no backlog, so the smaller sizes are simply due to the fact we
really aren’t pushing things that hard and don’t have any latency (hence NoneOrig’s other cause – no latency)
Notice the transactions per group….it is ~100-200x the config…this is good. If we were only 10x or less of the config, it
would point to grouping problems

DSI Feature Map
Feature ASE Hetero RS ASO Comments

Req’d
Dynamic SQL (DSQL)   15.0.1+ ASE, Ecx & DCAny
DSI Bulk Inserts (non-ASO)  ECx 15.1+ ASE, ECO, ECH, RTL/IQ
SQLDML  ? 15.2+
Non-blocking Commits  ORA 15.1+ Possibly HANA
DSIHQ (HVAR/RTL)  ECx 15.5+  ASE, ECO, ECH, RTL/IQ
DSI Command Prefetch  ECx 15.5+ 
Manual Multiple DSI   Any Fstring recodeing
Multiple DSI w/ MPR   15.7+ 
Direct command replication   15.7.1+
Direct load subscription ? HANA 15.7.1 SP100+ Tested, but not QA’d for ASE or non-HANA
targets, but works from all sources

DSI starting configuration in SRS
db_packet_size 512 8192+ 8192+ Leverage jumbo frames
dsi_bulk_copy off on on
dsi_bulk_threshold 20 10 10
dsi_cdb_max_size 1024 (n/a) 1024 or 2048
dsi_cmd_batch_size 8192 65536 65536
dsi_cmd_prefetch off (n/a) off
dsi_compile_enable off (n/a) on Never enable at server level – only connection
dsi_compile_max_cmds 10000 (n/a) 100000+ Set to ~size+ of largest txn held in SQT cache
dsi_compile_retry_threshold 100 (n/a) 500
dsi_large_xact_size 100 100000 1000000 Set to recommended value at server level
dsi_max_cmds_in_batch 100 100 100
dsi_max_xacts_in_group 20 20+ (n/a)
dsi_non_blocking_commit 0 10 10 If RDB is Oracle or ASE
dsi_row_count_validation on off off Should be on only for WS/MSA i
dsi_sqt_max_cache_size 0 32MB 512MB
dsi_xact_group_size 65536 8388608 (n/a)
dynamic_sql off See ppt See ppt See presentation

DSI Summary
For normal replication, not a lot of info here

• Look for premature group termination
• Watch the number of retries
If using DSQL, or DSI_BULK_COPY
• Look for reasons feature not used
• Actual commands sent via the feature will be in DSIEXEC
If using HVAR/RTL (DSIHQ)
• Look at HQ group closure reasons and STRONGLY consider MDSI
 In fact, you SHOULD use MDSI and only not do so when you have a compelling reason not to
 Lack of familiarity or fear of unknown is not a compelling reason….it is just laziness
• Look at table details in detailed report
 If a lot of small number of rows, consider reducing dsi_bulk_threshold or accept problem
• Focus on Time and Bulk Matrix

DSIEXEC Processing
Applying data to the replicate
DSIEXEC processing sequence
DSI Message
DSI Queue
DSIEXEC(s) RDS.RDB

DSIEXEC processing (1)
Txn Begin
• DSIEXEC reads from the DSI message queue which transaction group it is supposed to work on
• Once the first batch of SQL has been sent to RDB, it responds with a ‘begin’ message so that DSI can coordinate
other DSI’s if parallel DSI’s are used (e.g. think wait_for_start)
• If not using parallel DSI, this is minimal time
Function String Mapping

• DSIEXEC converts each replicated row image into the target SQL dialect
• This should not take a lot of time unless there are a lot of custom function string definitions
 For really complex function strings with multiple statements and therefore a lot more variable substitutions, consider invoking a
stored procedure in the function string definition instead and only have the overhead of variable substitution in one command vs.
multiple. Plus it leverages pre-optimized statements.
Batch
• DSIEXEC groups multiple statements together to send in small multi-statement chunks to the RDB for network
efficiency
• If there is a significant amount of time spent in batching, it could be due to any of the following reasons:
 Premature batch flushing due to misconfiguration of system or batching is disabled
 Batch SQL is not being used (e.g. Dynamic SQL uses RPC mechanism which cannot batch SQL)
 A bulk insert operation is being used (bulk operations use a slightly different path vs. the normal one)

DSIEXEC processing (2)
Send
• This is the time spent sending the data to the RDB – not execution time
• Should be very low unless a lot of text/image data or slow network
Results
• This is the time spent waiting for execution to complete and the typical ct_results() loop for each statement in
the batch
• This often will be the largest block of time
• See comments on next page as to how to reduce
Txn Commit
• This is the typical ‘commit tran’ and post-commit internal SRS cleanup
• Should not take much time unless parallel DSI’s are being used
 Commit sequencing time may be fairly high in such cases.

The bottlenecks in DSIEXEC processing
1. RDB compilation & optimization speed
• Each statement sent by SRS must be parsed, compiled and optimized prior to execution.
• In most cases, the DML statements sent by SRS are highly repetitive
2. RDB transaction log  Blocking Commits

• To ensure recoverability, when a transaction commits, the log records MUST be on disk
• This means waiting for a physical write (or multiple physical writes) to occur before the commit returns
• This also can happen if multiple ULC buffer flushes are necessary prior to commit
3. Serialized generation of SQL batches

• SRS waits until the previous batch executes before generating the next batch
4. Speed of atomic row-wise operations

• A single update of 100,000 rows at the primary becomes 100,000 single row updates at the replicate. Ignoring the compilation and
optimization time aspect, it still takes longer to process 100,000 atomic operations.
5. Single connection from SRS to RDB

• 3000 concurrent connections at the primary all funnel into a single connection at the replicate
6. Where clauses missing due to no repdefs
7. Slow execution in RDB (missing indexes, blocking, etc.)

#1: Replicated statement optimization speed
This is not a problem unique to SRS – afflicts all applications at any DBMS
Pure OLTP applications should use fully prepared statements for

• DML
• Non-reporting queries (e.g. status checks, get a list of customers, etc.)
DBMS’s offer two possible solutions
• Fully prepared statements (aka “Dynamic SQL” vs. “Static SQL”)
• Statement cache with optional automatic parameterization
SRS only sends DML…so….it is easily fits into the mode of pure OLTP

Recommendations for ASE
Target is ASE 12.5.4 or statement cache cannot be used at all (or heterogeneous DBMS)
• Note that a login trigger can be used to
 disable statement cache for all connections other than SRS, so …
 Enable literal auto parameterization for SRS connections/sessions only….
• Enable dynamic SQL in SRS for that connection
 Dynamic_sql  ‘on’ (default is off)
• Tune for a fairly large number of statements
 dynamic_sql_cache_management  ‘mru’ (default is ‘fixed’)
 dynamic_sql_cache_size  1000 (default is 100)
• Monitor via RS MC
Target is ASE 15.x+
• Use statement cache with literal auto parameterization
 Use a login trigger to enable literal parameterization for SRS sessions if not able to use for entire server (e.g. SAP apps)
• This leverages SQL batching for network efficiency and is quite a bit faster than DSQL (up to 50%)

Note on statement cache and datatypes
Statement cache and datatype mismatches

• Parameterized statements in the statement cache typically have placeholders based on datatype
• As a result, the following would be two distinct statements in statement cache
 Select * from table where column = 1
 Select * from table where column = 1.0
Impact on replication
• ASE replication agent often may forward whole numbers as literals (“1”) vs. numerics(“1.0”)
• DSIEXEC doesn’t convert data into SQL language text based on datatype but based on literal values
• Result is often a connection may cause a lot of extraneous statements in the statement cache
Work-around/resolution
• Use RS dynamic SQL
• Decrease dsi_bulk_threshold to force more bulk inserts/HVAR consideration

#2: Blocking Commits
Commit Blocking
• When a transaction in a DBMS issues a commit, it typically blocks until all the changes are recorded in the
transaction log
• To ensure full ACID, this requires that the physical IO’s to write the changes complete successfully prior to returning
control to the client
• This can have extreme performance penalties
 In ASE, most of the contention on the log semaphore is due to each pending commit waiting on the current transaction’s physical
writes.
 For short (e.g. atomic) transactions, most of the time is spent waiting on log writes
 monSysWaits/monProcessWaits WaitEventID’s 54 (semaphore) & 55 (last log page)
Non-Blocking Commits
• Both Sybase and Oracle support a form on non-blocking commits in which the commit proceeds as soon as the
writes are scheduled (but before they are confirmed)
• Sybase ASE 15.0+
 set delayed_commit { on | off }
 Also available as database option via sp_dboption
• Oracle 10gv2+
 alter session set commit_write = { nowait | immediate }

RS & Non-Blocking Commits (RS 15.1)
dsi_non_blocking_commit connection cfg parameter

• syntax
 alter connection to data_server.database set dsi_non_blocking_commit to ‘##’
• Valid values 0  60 (minutes)
 0 (default) disables non-blocking commits
 3 - 10 is probably more than enough
 This value extends the time in minutes RS saves messages after a commit on target database. Extending this time past
the maximum time ASE can wait before the log pages are written to disk assures no data loss to target DB.
 Obviously, it increases amount of queue space necessary
New system function strings

• Function string class scope (e.g. same as rs_begin)
• rs_non_blocking_commit
 Turns on non-blocking commit for the session
• rs_non_blocking_commit_flush
 Uses a blocking commit to force a flush and reclaim queue space

Non-Blocking Commits Performance
Benchmark Configuration:
• Sun RISC - 4 1592MHz CPUs, 24GB memory
• Transaction profile: TPC-C
Results:
• Without non-blocking commit feature: 16,574 sec
• With non-blocking commit feature: 12,727 sec
• Improvement of 30.2%
Comments:
• Feature will demonstrate better performance on slower devices
 DBMS log on RAID 5 or RAID 6
• Feature will also improve performance with smaller transactions
 either due to transaction grouping rules
 or if transaction grouping disabled

#3: Serialized generation of SQL batches
The default DSIEXEC sequence:

1. Get the next transaction group
2. Generate the SQL for the first/next batch
3. Send the SQL to the RDB (ct_send())
4. Call ct_results()
5. …and wait…
6. …and wait…
7. Go to step #2 until total transaction is finished
8. Issue rs_commit
Problem clearly seen with RS M&C

• DSIEXEC.DSIEResultTime (57063) almost always the problem if RS tuned appropriately
• Since ct_results() is a blocking call, DSIEXEC can’t do anything until it returns….
• Sooo the DSIEXEC time spend in DSIEFSMapTime and DSIEBatchTime becomes cumulative
 This may only save a few seconds…..

DSI Command Prefetch (RS 15.5 with ASO)
Defers ct_results() until next SQL batch is ready

• Alter connection to DS.DB set dsi_cmd_prefetch to {‘on’|‘off’}
• The goal is to try to turn a synchronous blocking call into an asynchronous call so that work can be done
while waiting.
• Best used DSIEFSMapTime + DSIEBatchTime is high other than due to bulk inserts
 If DSIEResultTime is high and DSIEBatchTime is <<50% this is not likely going to help or help much
 Don’t expect huge gains ….just DSIEFSMapTime + DSIEBatchTime (non Bulk/non HVAR)
The new sequence

1. Get the next transaction group
2. Generate the SQL for the first batch
3. Send the SQL to the RDB (ct_send())
4. Generate the next batch of SQL
5. Call ct_results()
 …and wait…maybe……depends on indexing, etc.
6. Go to step #3 until total transaction is finished
7. Issue rs_commit
#4: Atomic row-wise operations
There are a number of solutions depending on requirements
Solution #1: SQLDML

• High impact SQL statements
Solution #2: dsi_bulk_copy
• Bcp/bulk loads from files
• Insert/select without SQLDML
Solution #3: HVAR/RTL (DSIHQ)
• Beyond the scope of this session how this works
• Consolidates multiple transactions into bulk operations

#5: Single SRS connection
Single connection guaranteed transaction serialization

• Given the typical read/write ratios at the time, it was thought to be enough – quickly proven untrue
Legacy Parallel DSI (PDSI) (SRS 11.0+)
• Attempted to use parallel connections with concurrent execution but still enforced serialization
• Used commit sequencing controls to ensure transaction serialization was maintained
 Loose consistency with ISO 1….tight consistency with ISO 3
• Wasn’t effective in most use cases and extremely difficult to tune properly
Manual Multiple DSI (MDSI)
• Developed pre-11.x days as an early means of parallelism – taught in legacy SY classes
 Psuedo-supported with the caveat that data consistency was up to the customer
• Parallel implementation technique up to customer to implement
Multi-Path Replication (MPR) Multiple DSI (MDSI) (SRS 15.7+ requires ASO)
• Due to demand, implemented in 15.7 as part of MPR implementation
• Supports schema (by table/proc name) and SPID/session parallelism
 Transaction name and login name to be added as parallel techniques in future
• Rest of discussion beyond the scope of this session
#6: Reducing The Need for RepDefs
Prior to RS 15.7
• Table repdefs were required even in standby implementations (both WS & MSA) to
 Identify primary key columns
 Identify quoted identifier columns
• Lack of repdefs often resulted in:
 Dismal performance for updates & deletes as where clause was constructed using all non-BLOB columns. Replicate
DBMS had to compare & check all supplied columns not just primary key values (including long comment fields).
 Database inconsistencies as approximate numeric (float/real) columns were included in where clause and different
hardware FPU processing resulted in slightly different values
 Errors due to unexpected reserved words in SQL
RS 15.7
• ASE 15.7 now includes pkey & quoted identifier bits in column metadata.
• Requires primary key constraint or a unique index on tables in primary database.

Comments on RepDef Elimination/Reduction
Only for Standby ONLY Environments

• If primary and all targets are strictly standby systems (WS or MSA), this eliminates the need for replication
definitions.
• Repdefs still needed for:
 RS/RTL due to table subscriptions
 Any function string requirement
 Any other repdef property
– Different table names/owners
– Mapping column names & datatypes
– Autocorrection (note not dsi_command_convert), etc. (e.g. do not compile)
Consider
• Once RepAgent User/Executor normalizes to a replication definition…
 That repdef is associated with the command for the rest of processing
 The lack of a repdef means there is no association with a repdef
 If later you need to add a repdef (for function string manipulation), existing data in the queues will not be re-normalized to the
repdef
 In other words, it may be too late to add a repdef when needed
• Best Practice: continue to create table repdefs
 Use procs/scripts/PowerDesigner to avoid hand creation/enmass definition creation

#7: Slow execution in RDB
First step is to identify the tables

• ACTOBJ counters identify these for us – reports list them
• HOWEVER, the traceflag dsi, dsi_workload must be turned on
Second step is to check index on pkey used by repdef
• It is absolutely stunning the number of times no index exists on pkey
• Even if no real pkey, there should be minimally a non-unique index on repdef listed pkey cols
Third step is to monitor the replicate database
• Use monProcessWaits on maintuser connection in ASE to find out where it is waiting and why
 If blocking, see if HVAR/bulk insert is causing escalation to table lock and then raise lock escalation settings
• Use other MDA tables as necessary (monProcessStatement, monProcessActivity, etc.)

A word on counters
There is a plethora of DSIEXEC counters

• They are listed here only for completeness sake
The key thing to do:
• Rule out any configuration consideration (batch size, packet size, etc.)
• Check DSI technique related counters (such as dynamic sql, bulk copy, HVAR, etc.)
• Focus on time counters as key what to do next
The rest are just details
• ….that help you isolate the actual cause once you’ve identified the problem area

DSIEXEC basic counters
57000 TransSched Transactions groups scheduled to a DSIEXEC thread.
57001 UnGroupedTransSched Transactions in transaction groups scheduled to a DSIEXEC thread.
57002 DSIECmdsRead Commands read from an outbound queue by a DSIEXEC thread.
57005 MemUsedGroup Memory consumed by a DSI/S thread for transaction groups.
57007 BeginsRead begin' transaction records processed by a DSIEXEC thread.
57008 CommitsRead commit' transaction records processed by a DSIEXEC thread.
57009 SysTransRead Internal system transactions processed by a DSI DSIEXEC thread.
57010 InsertsRead rs_insert commands processed by a DSIEXEC thread.
57011 UpdatesRead rs_update commands processed by a DSIEXEC thread.
57012 DeletesRead rs_delete commands processed by a DSIEXEC thread.
57013 ExecsWritetext rs_writetext commands processed by a DSIEXEC thread.
Invocations of function rs_get_textptr by a DSIEXEC thread. This function is executed each time the thread processes a
57014 ExecsGetTextPtr
writetext command.
57120 CmdsSQLDDLRead SQLDDL commands processed by a DSI DSIEXEC thread.
57132 DSIEWaitSQT The number of times DSI/E must wait for the command it needs next to be loaded into SQT cache.
57147 DSIECmdsSucceed Commands successfully applied to the target database by a DSI/E.
57148 DSIEBytesSucceed Bytes successfully applied to the target database by a DSI/E.
57168 DSIEDeferredResult Number of times DSI/E defers ct_results call to pre-prepare the next batch.
57169 MsgQReadWaitTime Time waiting on reads from message queues.
57171 MsgQWriteWaitTime Time waiting on writes to message queues.
57175 DSIEDDLWorkload Workload for DDL expressed in bytes is the number of bytes required by the command text.
57177 DSIEWorkload Workload for this DSI expressed in bytes.
Number of commands received by DSIE directly from either DIST (outbound command) or EXEC (standby inbound
57179 DSIECmdsDirectRepRecv
command)
57180 DSIEGetCmdMsgQWait Time spent waiting to get next cmd from DSI-S
57183 DSIEOtherMsgQWait Time spent waiting for other events
57184 DSIEResidualParse Number of commands parsed in the second phase.
57185 DSIEDecompressionTime Time spent doing decompression

DSIEXEC errors and other syntactical issues

57032 ErrsDeadlock Times that a DSI thread failed to apply a transaction due to deadlocks in the target database (ASE Error 1205).
57033 ErrsOutofLock Times that a DSI thread failed to apply a transaction due to no locks available in the target database (ASE Error 1204).
57034 ErrsLogFull Times that a DSI thread failed to apply a transaction due to no available log space in the target database (ASE Error 1105).
57035 ErrsLogSuspend Times that a DSI thread failed to apply a transaction due to target the database in log suspend mode (ASE Error 7415).
57036 ErrsNoConn Times that a DSI thread failed to apply a transaction due to no connections to the target database (ASE Error 1601).
Number of output commands in command batches submitted by a DSI. Author’s Note: This is a companion to the below. A replicated row image
57079 DSIEOCmdCount
may require more than one output statement – for example, identity columns will need 3 – set identity_insert on; the DML; set identity_insert off
57082 DSIEICmdCount Number of input commands in command batches submitted by a DSI.
Number of times function string none output processed. Author’s Note: while generally thought of as an error, legacy systems may have this when
57167 DSIEFstrNoOutput
deletes where mapped to an empty string prior to implementation of dsi_command_convert with d2none
57178 DSIEControlMem Number of time the memory control is executed in a DSI/E.

DSIEXEC Batching
57070 DSIEBatchTime Time, in milli-seconds, to process command batches submitted by a DSI.
57076 DSIEBatchSize Size, in bytes, of command batches submitted by a DSI.
57079 DSIEOCmdCount Number of output commands in command batches submitted by a DSI.
57082 DSIEICmdCount Number of input commands in command batches submitted by a DSI.
Number of batch flushes executed because the next command is to have its results processed in a context different from the current
57085 DSIEBFResultsProc
batch.
57086 DSIEBFCommitNext Number of batch flushes executed because the next command in the transaction will be a commit.
Number of batch flushes executed because we have a new command and the maximum number of commands per batch has been
57087 DSIEBFMaxCmds
reached.
57088 DSIEBFRowRslts Number of batch flushes executed because we expect to have row results to process.
57089 DSIEBFRPCNext Number of batch flushes executed because the next command is an RPC.
57090 DSIEBFGetTextDesc Number of batch flushes executed because the next command is a get text descriptor command.
57091 DSIEBFBatchOff Number of batch flushes executed because command batching has been turned off.
57092 DSIEBFMaxBytes Number of batch flushes executed because the next command would exceed the batch byte limit.
Number of batch flushes executed because the next command is a 'transaction begin' command and by configuration such commands
57093 DSIEBFBegin
must go in a seperate batch.
57094 DSIEBFSysTran Number of batch flushes executed because the next command is part of a system transaction.
Number of batch flushes executed because the situation forced a flush. For example, an 'install java' command needs to be executed,
57095 DSIEBFForced
or the next command is the first chuck of BLOB DDL.
57121 DSIEResSucceed The number of times a data server reported successful executions of a command batch.
57122 DSIEResFail The number of times a data server reported failed executions of a command batch.
57123 DSIEResDone The number of times a data server reported the results processing of a command batch execution as complete.
57124 DSIEResStatus The number of times a data server reported a status in the results of a command batch execution.
57125 DSIEResParm The number of times a data server reported a parameter, cursor or compute value in the results of a command batch execution.
57126 DSIEResRow The number of times a data server reported a row as being returned in the results of a command batch execution.
The number of times a data server reported a message or format information as being returned in the results of a command batch
57127 DSIEResMsg
execution.
57148 DSIEBytesSucceed Bytes successfully applied to the target database by a DSI/E.
57157 DSIEBFBulkNext Number of batch flushes executed because the next command is a bulk copy.

Dynamic SQL, Bulk inserts & SQLDML

57149 DSIEDsqlPrepared Dynamic SQL statements prepared at target database by a DSI/E.
57150 DSIEDsqlDealloc Dynamic SQL statements deallocated at target database by a DSI/E.
57151 DSIEDsqlExecuted Dynamic SQL statements executed at target database by a DSI/E.
57152 DSIEDsqlDeallocSchema Dynamic SQL statements deallocated at replicate database by a DSI/E because of schema change.
57153 DSIEDsqlDeallocExecFail Dynamic SQL statements deallocated at replicate database by a DSI/E because the statements failed to execute.
57154 DSIENoDsqlCacheFull Number of commands excluded from dynamic SQL because the cache is full.
57155 DSIEDsqlStmtsInCache Number of dynamic SQL statements currently in cache.
57156 DSIEDsqlRetryLang Number of language commands executed after dynamic SQL commands failed.
57157 DSIEBFBulkNext Number of batch flushes executed because the next command is a bulk copy.
57158 DSIEBulkSucceed Number of time blk_done(CS_BLK_ALL) is called at target database by a DSI/E.
57159 DSIEBulkCancel Number of time blk_done(CS_BLK_CANCEL) is called at target database by a DSI/E.
57160 DSIEBulkRows Number of rows sent through Bulk operations by a DSI/E.
57161 BulkTime Time, in milli-seconds, spent in sending data through Bulk operation to the RDS.
57163 SQLDMLUpdRow Row counts affected by SQLDML update commands.
57164 SQLDMLDelRow Row counts affected by SQLDML delete commands.
57165 SQLDMLSelIntoRow Row counts affected by SQLDML select into commands.
57166 SQLDMLInsSelRow Row counts affected by SQLDML insert select commands.
57167 DSIEFstrNoOutput Number of times function string none output processed.
57168 DSIEDeferredResult Number of times DSI/E defers ct_results call to pre-prepare the next batch.

Parallel DSI (not Multiple DSI)

57007 BeginsRead begin' transaction records processed by a DSIEXEC thread.
57008 CommitsRead commit' transaction records processed by a DSIEXEC thread.
Invocations of rs_update_threads by a DSIEXEC thread. This function is executed when the DSI thread is configured for
57015 ExecsUpdThread
parallel_dsi.
Invocations of rs_get_thread_seq by a DSIEXEC thread. This function is executed when the DSI thread is configured for
57016 GetThreadSeq
parallel_dsi.
57032 ErrsDeadlock Times that a DSI thread failed to apply a transaction due to deadlocks in the target database (ASE Error 1205).
57102 DSIESCCTime Time, in milli-seconds, to check the sequencing on commits.
Time, in milli-seconds, to check the sequencing on command batches which required some kind of synchronization such as
57108 DSIESCBTime
'wait_for_commit'.
The amount of time taken by a DSI/E to give up its connection to the RDB. Connections are released when the DSI/E is sent a
57128 DSIEGiveUpConnTime
''go to sleep'' message.
57133 DSIEGetTranTime The amount of time taken by a DSI/E to obtain control of the next logical transaction.
57135 DSIERelTranTime The amount of time taken by a DSI/E to release control of the current logical transaction.
The amount of time taken by a DSI/E to finish cleaning up from committing the latest tran. These clean up activities include
57137 DSIEFinishTranTime
awaking the next DSI/E (if using parallel DSI) and notifying the DSI/S.
57173 DSIESSSTime The time taken for start sequence.
57181 DSIEBatchReadySeqMsgQWait Time spent waiting for seq when batch is ready.
57182 DSIECommitSeqMsgQWait Time spent waiting for seq for commit

DSIEXEC Time (!!!)
57037 SendTime Time, in milli-seconds, spent in sending command buffers to the RDS.
57051 SendRPCTime Time, in milli-seconds, spent in sending RPCs to the RDS.
57057 SendDTTime Time, in milli-seconds, spent in sending chunks of text or image data to the RDS.
57063 DSIEResultTime Time, in milli-seconds, to process the results of command batches submitted by a DSI.
Time, in milli-seconds, to process command batches submitted by a DSI. Authors Note: This can be distorted as dsi_bulk_copy and
57070 DSIEBatchTime dsi_command_prefetch can make this appear artificially high. When using dsi_bulk_copy/dsi_command_prefetch – this counter
essentially includes the DSIEResultTime.
57096 DSIEFSMapTime Time, in milli-seconds, to perform function string mapping on commands.
Time, in milli-seconds, to process transactions by a DSI/E thread. This includes function string mapping, sending and processing
57114 DSIETranTime
results. A transaction may span command batches.
57130 DSIEReadTime The amount of time taken by a DSI/E to read a command from SQT cache.
57133 DSIEGetTranTime The amount of time taken by a DSI/E to obtain control of the next logical transaction.
57135 DSIERelTranTime The amount of time taken by a DSI/E to release control of the current logical transaction.
The amount of time taken by a DSI/E to finish cleaning up from committing the latest tran. These clean up activities include awaking
57137 DSIEFinishTranTime
the next DSI/E (if using parallel DSI) and notifying the DSI/S.
57139 DSIEParseTime The amount of time taken by a DSI/E to parse commands read from SQT.
57141 DSIEPrepareTime The amount of time taken by a DSI/E to prepare commands for execution.
The amount of time taken by a DSI/E to execute commands. This process includes creating command batches, flushing them,
57143 DSIEExecCmdTime
handling errors, etc.
The amount of time taken by a DSI/E to execute commands related to text/image data. This process includes initializing and retreiving
57145 DSIEExecWrtxtCmdTime
text pointers, flushing commands, handling errors, etc.
57161 BulkTime Time, in milli-seconds, spent in sending data through Bulk operation to the RDS.
57183 DSIEOtherMsgQWait Time spent waiting for other events
57185 DSIEDecompressionTime Time spent doing decompression

Active Object (AOBJ) counters

65000 AOBJInsertCommand Insert command on active object.
65001 AOBJUpdateCommand Update command on active object.
65002 AOBJDeleteCommand Delete command on active object.
65003 AOBJWritetextCommand Writetext command on active object.
65004 AOBJExecuteCommand Execute command on active object.
65005 AOBJInsertCommand2 Insert command on active object.
65007 AOBJUpdateCommand2 Update command on active object.
65009 AOBJDeleteCommand2 Delete command on active object.
65011 AOBJWritetextCommand2 Writetext command on active object.
65013 AOBJExecuteCommand2 Execute command on active object.
65015 AOBJEstRowSize In terms of bytes, this is the estimated size of a row associated with this object.
In terms of bytes, this is the estimated size of LOB values associated with this
65017 AOBJEstLOBSize object. At the end of an operation involving LOB values, this counter is added to
counter CNT_AOBJ_EST_ROWSIZE and then cleared.

Example from an SD benchmark
Active Object Total Cmds Inserts Updates Deletes WriteTexts Proc Execs
---------------------------------------- ---------- ---------- ---------- ---------- ---------- ----------
(13) 12:42:09 -> 12:54:40

SAPSR3.VBDATA 157183 49734 0 49477 57972 0
SAPSR3.VBMOD 99211 49734 0 49477 0 0
SAPSR3.FAGLFLEXA 68104 68104 0 0 0 0
SAPSR3.VBUP 49910 21635 28275 0 0 0
SAPSR3.VBFA 46145 46145 0 0 0 0
SAPSR3.ACCTCR 37680 37680 0 0 0 0
SAPSR3.FAGLFLEXT 35344 0 35344 0 0 0
SAPSR3.VBPA 19898 19898 0 0 0 0
SAPSR3.VBBS 19835 3010 14260 2565 0 0
SAPSR3.S032 18860 0 18860 0 0 0
(14) 12:54:41 -> 13:07:13

SAPSR3.VBDATA 202071 64004 0 64115 73952 0
SAPSR3.VBMOD 128119 64004 0 64115 0 0
SAPSR3.FAGLFLEXA 94812 94812 0 0 0 0
SAPSR3.VBFA 62010 62010 0 0 0 0
SAPSR3.VBUP 60830 23775 37055 0 0 0
SAPSR3.FAGLFLEXT 49944 0 49944 0 0 0
SAPSR3.ACCTCR 49760 49760 0 0 0 0
SAPSR3.BSIS 24955 24955 0 0 0 0
SAPSR3.ACCTIT 24880 24880 0 0 0 0
SAPSR3.S032 24870 0 24870 0 0 0
(15) 13:07:14 -> 13:19:47

SAPSR3.VBDATA 98705 31391 0 31366 35948 0
SAPSR3.VBMOD 62758 31392 0 31366 0 0
SAPSR3.FAGLFLEXA 49674 49674 0 0 0 0
SAPSR3.VBFA 31465 31465 0 0 0 0
SAPSR3.VBUP 28020 9670 18350 0 0 0
SAPSR3.FAGLFLEXT 26556 0 26556 0 0 0
SAPSR3.ACCTCR 24580 24580 0 0 0 0
SAPSR3.BSIS 13115 13115 0 0 0 0
SAPSR3.S032 12300 0 12300 0 0 0

DSIEXEC Feature Map
Feature ASE Hetero RS ASO Comments

Req’d
Dynamic SQL (DSQL)   15.0.1+ ASE, Ecx & DCAny
DSI Bulk Inserts (non-ASO)  ECx 15.1+ ASE, ECO, ECH, RTL/IQ
SQLDML  ? 15.2+
Non-blocking Commits  ORA 15.1+ Possibly HANA
DSIHQ (HVAR/RTL)  ECx 15.5+  ASE, ECO, ECH, RTL/IQ
DSI Command Prefetch  ECx 15.5+ 
Manual Multiple DSI   Any Fstring recodeing
Multiple DSI w/ MPR   15.7+ 
Repdef elimination 15.7+ ? 15.7.1+

Lab #4
DSIEXEC
Lab 4: DSIEXEC
DSIEXEC overall
1. Where was time spent the most? What are the possible ways it could be reduced?
2. Where was the second most time spent? What are the possible ways it could be reduced?
3. Was time spent somewhere that you likely couldn’t affect much???
DSIEXEC configuration
4. Was dsi_batch_size set correctly?
5. How many commands per batch were being sent? Do you consider this effective?
6. At the packet size, how many packets would it take?
Dsi_bulk_copy, dynamic SQL and SQLDML

5. Were these features effective?
6. Were there tables where it could have been used?
7. Would HVAR help??

Feedback
Please complete your session evaluation for <RDP364>.
Copies of slides, utilities available from jeff.tallman@sap.com
Thanks for attending this SAP TechEd session.

Extra Materials
Stuff we didn’t have time to cover
ASE RepAgent
A gradual evolution of RepAgent from tortoise to hare
ASE RepAgent: Gradual evolution
ASE process kernel mode issue
ASE 15.7 Multi-threaded RepAgent
ASE 15.7 MPR
ASE 15.7 sp100 Multiple Scanners/RepAgent filters

ASE Process Kernel
All out going connections used “Deferred Async Events”

• RepAgent, CIS, RPC’s, etc.
• If CPU or network was busy, ASE kernel would deliberately defer processing the return status/results for an
outgoing connection
• Result was that RepAgent, CIS, etc. would simply block and wait (become idle)
• This was deliberate design to avoid RepAgent or CIS from monopolizing DBMS
ASE in process kernel mode work-arounds

• ASE 12.5.4 ESD#10+  bind_to_engine to bind to low utilized engine (e.g. last_online)
• ASE 15.0.3  ltl_batch_size to increase LTL batches sent to ASE before needing ACK
15.7+ in kernel threaded mode
• RepAgent, CIS, RPC’s, etc. no long use deferred async events
• Up to 10x throughput gain.

15.7 Multi-Threaded RepAgent
Pre-15.7 RepAgent
• Each database repagent ran as a separate session (spid) in ASE
• Sequence
 Scan from log to fill buffer
 Once operations buffer filled, convert buffer from log records to RepAgent LTL
 Once LTL buffer filled, send packets
 Scan next log page(s)
15.7 Multi-Threaded RepAgent

• Requires ASO option in RS
• Separate log scanner and sender threads
 Initially, single scanner but could have multiple senders
 Scanner reads from log and populates operations buffer
 Sender reads from operations buffer, converts to LTL and sends to RepServer.
 Still a single truncation point in the log (single log scanning thread doesn’t need more than 1)
 Any sender theoretically could process any operation
 Can send DDL for tables on a different path than DML (e.g. log running create index)
• Multiple senders for MPR
 Each sender could send to multiple RepServers via a single logical path with multiple physical paths (see next slide)

The Undersirable
Primary DR
Long distance
???

Multi-Path Replication (MPR) w/ Logical Paths & Multiple SRS

Implementing MPR/Logical Paths
1. Configure multiple senders

• 1 for default + 1 for each logical (or physical) path
sp_config_rep_agent primary_database_name, 'multithread rep agent', 'true'
-- 3 plus 1 default
sp_config_rep_agent primary_database_name, 'max number of replication paths', '4'
2. Create each physical path to each RS for 3 non-default RA's

sp_replication_path my_pdb, 'add', pdb_sales_DR, DR_RS, DR_RS_ra_user, DR_RS_ra_pwd
sp_replication_path my_pdb, 'add', pdb_sales_RPT, RPT_RS, RPT_RS_ra_user, RPT_RS_ra_pwd
3. Create the logical path

sp_replication_path my_pdb, 'add', logical, sales_logical, pdb_sales_DR
sp_replication_path my_pdb, 'add', logical, sales_logical, pdb_sales_RPT
4. Bind the objects

sp_replication_path my_pdb, 'bind', 'table', 'dbo.fin%', 'sales_logical'

15.7.1 Multiple Scanners
Multiple scanners
• Syntax:
 sp_config_rep_agent dbname, 'multiple_scanners', {'true' | 'false'}
• Builds on multiple senders/multi-threaded RepAgents
• Adds an additional coordination thread
• Additional RepAgent configuration:
Additional configuration considerations
• sp_config_rep_agent dbname, 'trunc point request interval', '##‘
 This is necessary or otherwise the path with lowest change volume would drive truncation point location frequency and
could cause log to fill
• sp_configure 'replication agent memory size', ####
 Determines the size of the global pool for RepAgent schema caches, etc.
• sp_config_rep_agent 'max schema cache per scanner', ####
 Specifies how much schema cache is allocated per scanner
 Default is 512KB
• sp_config_rep_agent 'multipath distribution model', {'object'|'connection'|'filter'}
ASE 15.7.1 Replication Filters
Replication Filters
• Creates a filter than can be used by multiple scanners to filter out log records that don’t apply
• Syntax:
 create replication filter filter_name on table_name as filter_clause
 Filter clause can be any valid SQL expression
– Formula
– built-in function (e.g. hash), in(), like(), between…
• Filters are then bound to a replication path via sp_replication_path
Restrictions
• Filters can only be created on tables/procs in database filter is created in
• No complex expressions such as:
 Joins
 Subqueries
 No User-defined functions

Example filters
create replication filter advance_vs_filter on sales as advance * 2 > total_sales * price
create replication filter filter_in_sales on sales as total_sales between 4095 and 12000
create replication filter filter_out_sales on sales as total_sales not between 4095 and 12000
create replication filter state_in on sales as state in ("CA", "IN", "MD")
create replication filter state_out on sales as state not in ("CA", "IN", "MD")
create replication filter author_lastname on books as au_lname like "[CK]ars[eo]n"
create replication filter not_phone_num on contacts as phone not like "415%"
create replication filter advance_null on sales as advance is null
create replication filter advance_vs_totalsales on sales

as advance < 5000 or total_sales between 2000 and 2500
create replication filter older on sales as date > getdate()
create replication filter all_rows on sales as true --(more on this in a minute)

Sp_replication_path syntax and examples
sp_replication_path 'dbname', {
'add' 'physical_path', 'repserver_name', 'rs_username', 'rs_password'
| 'add', 'logical', 'logical_path', 'physical_path'
| 'drop', 'physical_path'
| 'drop', 'logical', 'logical_path', [,'physical_path']
| 'bind', '{table | sproc | filter}', '[table_owner].object_name', 'path_name'
| 'unbind', '{table | sproc | filter | path}', 'object_name', {'path_name' | all}
| 'config', 'path_name', 'config_parameter', 'config_value'
| 'list'[, 'all | table | sproc | filter' [, 'object_name']]
}
sp_replication_path 'pdb', 'bind', 'table', 'owner1.t2', 'pdb_2'
sp_replication_path 'pdb', 'bind', 'sproc', 'sproc1', 'pdb_2'
sp_replication_path 'pdb', 'bind', 'table', 'a*', 'pdb_a'
create replication complete_sales on sales as status='complete'

exec sp_replication_path 'pdb', 'bind', 'filter', 'complete_sales', 'pdb_a'

Implementing multiple scanners
The first step is to consider the primary reason
• Are we trying to load balance a lot of user sessions to parallelize throughput
• Are we trying to divide different schema areas of the database across parallel paths
• Are there a few tables that we are trying to distribute based on data values
 Implies that a single session is manipulating most of the data or else session based implementation is better option
Determine the degree of parallelism likely needed and configure the logical paths
• Configure the scanners/senders, etc.
• Add logical paths
Set the rep agent mode appropriately

• There is only one ‘rep agent’…..multiple scanning threads/sending threads….but one ‘rep agent’
• That one repagent can only have a single distribution model
 Object  schema based parallelism
 Connection  User session (e.g. SPID) based parallelism
 Filter  filter based parallelism
Bind the objects/filters to the logical paths

• Connections will be hashed across the available repagents
Restart the repagent

Some points to ponder #1: Distribution Modes
Remember, you have the ‘default’ path

• You can only bind to logical paths you create
Sooo…let’s take some common scenarios
Scenario 1: 4 huge tables we want to split by value (hashed) and 3000 other tables
• Large tables are bulk loaded – otherwise readonly
• We want to use 5 way parallelism on the huge tables
 Transaction serialization on large tables irrelevant due to bulkload.
• Target DBMS doesn’t support dsi_bulk_copy nor HVAR, so we need to use parallelism to keep up with SRS
Scenario 2: 700 concurrent users doing batch processing after market close
• We think we need ~6 parallel paths to keep latency to a minimum
 Yet retain transaction serialization with respect to the same session.
Scenario 3: We have a sales/shipping/finance apps sharing same database

• They may all need to read the same tables, but only update their own tables
• Shipping updates may affect sales tables but is done via a proc
• We want each (sales, shipping, finance) to have separate replication paths to reduce latency
 While retaining transaction serialization with each application
Scenario 1: 4 large tables and bulk loads
Configure the multiple scanners/senders

exec sp_config_rep_agent dbname, 'multithread rep agent', 'true'
exec sp_config_rep_agent dbname, 'multiple_scanners', 'true'
exec sp_configure 'replication agent memory size', '204800'
exec sp_config_rep_agent dbname, 'number of send buffers', '2500'
exec sp_config_rep_agent dbname, 'trunc point request interval', '15'
Create alternate connections in RS

• Add 4 alternate connections to the primary/existing connection – use different RA names
Configure the replication paths

-- 4 plus 1 default
exec sp_config_rep_agent dbname, 'max number of replication paths', '5'
exec sp_replication_path dbname, 'add', trades_p1, PROD_RS, …
…
Scenario 1: 4 large tables and bulk loads (cont)
Create the replication filters

create replication filter trades_f1 on trades as hashbytes(ptn,trade_id,trade_party)%4=0
Bind the filters to the paths

exec sp_config_rep_agent dbname, 'multipath distribution model', 'filter'
exec sp_replication_path 'pdb', 'bind', 'filter', 'trades_f1', 'trades_p1'

Scenario 2: 700 concurrent batch processes



-- 5 plus 1 default
exec sp_replication_path dbname, 'add', spid_alt1, PROD_RS, …
exec sp_replication_path dbname, 'add', spid_p2, PROD_RS, …
Set the distribution mode

exec sp_config_rep_agent dbname, 'multipath distribution model', 'connection'

Scenario 3: sales/shipping/finance apps


• Non-shipping/sales/finance will continue to use default

-- 3 plus 1 default
exec sp_replication_path dbname, 'add', dbn_shipping, PROD_RS, …
exec sp_replication_path dbname, 'add', dbn_finance, PROD_RS, …
exec sp_replication_path dbname, 'add', dbn_sales, PROD_RS, …

Scenario 3: sales/shipping/finance apps (cont)
Bind the schemas to the paths

exec sp_config_rep_agent dbname, 'multipath distribution model', 'object'
exec sp_replication_path 'pdb', 'bind', 'table', 'shipping.*', 'dbn_shipping'
exec sp_replication_path 'pdb', 'bind', 'table', 'finance.*', 'dbn_finance'
exec sp_replication_path 'pdb', 'bind', 'table', 'sales.*', 'dbn_sales'
exec sp_replication_path 'pdb', 'bind', 'sproc', 'upd_sale_ship_status', 'dbn_shipping'

Some points to ponder #2: Replication Filters vs. Paths
We have both logical and physical paths

• They could be to different SRS’s….and end up in different replicate databases
…and the default
• If the filter doesn’t apply, it will get distributed on the default connection
• So….there really is no way to *only* distribute “completed” sales
 However, we can set up a filter such that only completed sale go to a specific connection (e.g. a reporting system)
Multiple paths  single target (no filter overlap)

• Make sure filters bound to paths that replicate to the same target are mutually exclusive
Multiple paths  different targets (filter overlap allowed)
• Default path sends data to standby system
• Use the replication filter ‘create replication filter on tablename as true’ to send all the data to one target
 In addition to other data it might be interested in….e.g. an auditing system gets all the data on separate connection/RS
• Use a filter with an expression to send only a subset to a different target
 Completed sales get sent on a different path and different RS to the reporting server

Some points to ponder #3: Replication filters on column values
Be careful if updates to columns cause a row to sent on multiple paths
Consider the following filters:

create replication filter state_in on insure_policy as state in ("CA", "NJ", "NY")
create replication filter state_out on insure_policy as state not in ("CA", "NJ", "NY")
Now consider the following scenarios:

• Scenario 1: a quick insert of a policy in “CA” and then updated to “FL”
• Scenario 2: a single update of a policy from “CA” to “FL”
Topology: Multipath replication to the same target
• Could the update arrive ahead of the insert due to faster path???
Topology: Different paths to different targets (e.g. individual state copies)
• Do we leave the policy in the CA target stranded or do we need to somehow delete it?
Lesson: filter on columns that are static or rarely change

RepAgent Feature Map
Feature ASE ASE 15.7 ASE ASO Comments

15.0.3+ threaded 15.7 Req’d
mode sp100+
Bind_to_engine   
LTL batch size   
Multi-threaded RepAgent   
RepAgent MPR w/ multiple senders  
RepAgent MPR w/ multiple scanners  

Direct Load
A really neat feature for HANA & ASE
Direct Load
A new feature in RS Carina (RS 15.7.1 SP 100+)
Supports true zero-down time subscription materialization into HANA
At this time, only HANA & ASE are supported as target

• [ORA, MSSQL, DB2, ASE]  HANA supported as of SP100+
• ASE  ASE supported as of SP102+
All RS’s in routes must be RS 15.7.1 SP100+
• Both the primary RS and replicate RS
• The sysadmin site_version as well as route_version must be 1571100 or higher
Restrictions
• Cannot be from or to a logical connection (Warm Standby) (convert to MSA)
• For ASE, unless using all datarows locking on all tables, set max_mat_load_threads to 1
 Otherwise, the parallel apply threads will block each other on same page and possible undetected application deadlock

Creating subscriptions for HANA or ASE with direct_load
Create subscription adds direct load for HANA/ASE

• A separate materialization/ “catchup” queue is created
• RRS serving HANA will connect back to PDS.PDB and issue SELECT
 For ASE, this will directly connect to ASE – use sapsa or other valid login
 For RAX, it will connect to repagent and repagent will issue query instead of SRS
– This avoids the SRS needing the drivers for the source database
• Data is streamed from select statement into materialization queue
• SRS inserts data into HANA/ASE using bulk inserts
 Commits every 10,000 rows by default
 Loads 10K rows in mere seconds
Syntax:
-- Subscription IDS_aseserver1.IDS.SAPSR3.KEKO -> HAN_DB.SAP_ECC.SAP_ECC.KEKO
-- Table Replication Definition: IDS1_SAPSR3_KEKO_rd
create subscription IDS1_2_HAN_KEKO_sub
for IDS1_SAPSR3_KEKO_rd
with replicate at HAN_DB.SAP_ECC
without holdlock direct_load user sapsa password Sybase1234
subscribe to truncate table
go
This could be SAPSR3 if you know password
Checking subscription progress
For most subscriptions – mere seconds

• No need in flight – check when done
For large tables, check subscription reports status
• This is an example of a subscription that is on the cusp of completing…
check subscription IDS1_2_HAN_TWTY_sub
for IDS1_SAPSR3_TWTY_rd
with replicate at HAN_DB.SAP_ECC
go
Subscription IDS1_2_HAN_TWTY_sub has been MATERIALIZED at the replicate.

Subscriptions IDS1_2_HAN_TWTY_sub progress: catchup, 100% done, 0 commands remaining.
• When complete, you will see

Subscription IDS1_2_HAN_TWTY_sub is VALID at the replicate.

The advantages of direct load
Eliminates need for manual/bulk materialization

• It does it for you – and you don’t need to know the DBMS bulk load tools
Completely zero down-time materialization
• By using the atomic subscription markers, the data is moved without any loss even if business transactions
are in flight.
No huge DSI backlog while subscription materializing

• With normal atomic subscription materialization, the normal DSI is suspended while the materialization
occurs.
• Data affecting other tables is then held-back and can cause significant backlog in SRS
• This backlog can take a long time to drain before next subscription is attempted.
This translates into you can take your time to set up RS

• No rushing to minimize down-time….can be done over weeks
• This allows RS tuning to be done as load/data increases instep with system build-out

Direct Load in SRS 15.7.1 SP102+
When you create a subscription with direct load, SRS does the following
1. Creates a catch-up queue for inflight DMLs
2. Sends usual begin subscription marker to primary ASE
3. Opens a direct connection from the RRS to the primary database/RAX to select the data
4. As the data is retrieved, it splits the round robin by row across parallel apply threads
 Number of threads controlled by max_mat_load_threads
 Each thread uses bulk insert to send data to replicate
 Each thread commits every mat_max_tran_size rows
 The apply threads are separate from normal DSI
– no need to suspend it as with normal subscriptions using atomic materialization
5. Any new transaction/DML that happens during materialization is held in catch-up queue
 So…if a tran affects 3 tables (A, B, C) and we are materializing C
– A&B are applied as normal by the DSI
– Rows for C are sent to the side to the catch-up queue
6. When select completes, SRS starts to apply catch-up queue
 Inserts are sent as delete/insert. Updates as normal (no D/I conversion)
 As a result, updates on PKey will cause subscription to fail (row missing)
7. When SRS is nearly complete with catch-up queue, end marker is put into primary log
8. When end-marker is seen by SRS, it directs new DML to normal DSI queue
 And then tears down the catch-up queue.

Direct load internals – initial materialization
Monitor thread
Source DB 1 max_mat_apply_threads
Select thread 2 bulk_insert C (parallel)
3
BT
Ins A
RSI-U Catch-up SQM
Ins B
Ins C
CT
BT HANA
Ins C
CT
BT
Ins A
Ins B
Ins C Catch-up Queue
CT !
BT
DIST Ins A
BT Ins B
Ins A CT
Ins B
CT
DSI DSI-EXEC
Outbound SQM BT
BT Ins A
Ins A Ins B
Ins B
CT Outbound Queue
CT

Direct load internals – catch-up phase
Monitor thread
Source DB max_mat_apply_threads
Select thread
BT
Ins A
RSI-U Catch-up SQM Catch-up DSI
Ins B
Ins C
CT
BT
4 BT HANA
Ins C Ins C
BT
BT
Ins A
CT Ins C
CT
5 CT
Ins B
Ins C Catch-up Queue Catch-up 6
CT
DSI-EXEC
BT
DIST Ins A
BT Ins B
Ins A CT
Ins B
CT
DSI DSI-EXEC
Outbound SQM BT
BT Ins A
Ins A Ins B
Ins B
CT Outbound Queue
CT

Direct load internals - completed
Monitor thread
Select thread
BT
Ins A
Ins B
Ins C
CT
BT HANA
Ins C
BT CT
BT Ins C
Ins A CT
Ins B
Ins C Catch-up Queue Catch-up
CT
DSI-EXEC
BT
DIST Ins A
BT Ins B
Ins A Ins C
Ins B CT
Ins C
CT
DSI DSI-EXEC
Outbound SQM BT
BT Ins A
Ins A Ins B
Ins B
Ins C Outbound Queue
Ins C
CT CT

Direct load internals – full picture
Monitor thread
1
Select thread 2 bulk_insert C (parallel)
3
BT
Ins A
Ins B
Ins C
CT
BT
4 BT HANA
Ins C Ins C
BT
BT
Ins A
CT Ins C
CT
5 CT
Ins B
Ins C Catch-up Queue Catch-up 6
CT
DSI-EXEC
BT
DIST Ins A
BT Ins B
Ins A CT
Ins B
CT
7
DSI DSI-EXEC
Outbound SQM BT
BT Ins A
Ins A Ins B
Ins BOutbound Queue
CT
CT

The explosion of threads
Did you count those threads???

• Total threads for EACH concurrent subscription = 10+ by default
 Select thread
 Monitor thread
 Apply threads (max_mat_apply_threads)
 Catch-up SQM thread
 Catch-up DSI thread
 Catch-up DSI-EXEC thread
• Make sure
 Num_concurrent_subs is set high enough (prevent errors)
 Num_threads is configured high enough
– For 5 concurrent subscriptions, you will need 50 more threads than default
– Increase num_threads appropriately
Be very careful with changing max_mat_apply_threads from default

• If doing an optimized subscription creation you might be able to play with it, but otherwise, you risk a huge
thread explosion and failure…

Sizing RS for concurrent subscriptions
Concurrent subscriptions vs. cm_max_connections/num_threads

• cm_max_connections (default is 64)
 RS total connections not just database connections.
• The default for max_mat_load_threads is 5.
 Set to 1 for ASE if using APL or datapage locking and if there are clustered indexes
 Lower tends to work better than higher – e.g. 3 outperforms 10 in some cases
• For each create subscription, you would need 10+ threads
 Num_threads to add = (max_mat_load_threads + 5) * concurrent subscriptions
 When you create 10 subscriptions at the same time, you would need 100+ threads.
• You need to set both num_threads and cm_max_connections to a much bigger number.
 This likely has an impact on num_mutexes as well….
No need to change actual DSI configs

• E.g. you don’t need to change dsi_num_threads – the above doesn’t count towards it.
Recommendations
• num_concurrent_subs to 40+ (possibly 50)
• num_threads by at least num_concurrent_subs*10
• num_stable_queues to (connections x 2 + num_concurrent_subs + 10) or 50 (whichever greater)
• cm_max_connections by ~num_concurrent_subs*5 (e.g. the default is 64 – set to 300)
Admin who with concurrent direct_load
Spid Name State Info
---- ---------- -------------------- ------------------------------------------------------------
28 DSI EXEC Awaiting Command 112(1) HAN_REP_RSSD.HAN_REP_RSSD
19 DSI Awaiting Message 112 HAN_REP_RSSD.HAN_REP_RSSD
23 DIST Awaiting Wakeup 112 HAN_REP_RSSD.HAN_REP_RSSD
27 SQT Awaiting Wakeup 112:1 DIST HAN_REP_RSSD.HAN_REP_RSSD
13 SQM Awaiting Message 112:1 HAN_REP_RSSD.HAN_REP_RSSD
11 SQM Awaiting Message 112:0 HAN_REP_RSSD.HAN_REP_RSSD
32 REP AGENT Awaiting Command HAN_REP_RSSD.HAN_REP_RSSD
109 DSI EXEC Awaiting Command 114(1) HAN_DB.SAP_ECC
108 DSI Active 114 HAN_DB.SAP_ECC
15 SQM Awaiting Message 114:0 HAN_DB.SAP_ECC
Normal DSI
68 DSI EXEC Awaiting Command 937(1) HAN_DB.rs_0100006780000627IDS1_2_HAN_
123 DSI Awaiting Message 937 HAN_DB.rs_0100006780000627IDS1_2_HAN_
126 SQM Awaiting Message 937:0 HAN_DB.rs_0100006780000627IDS1_2_HAN_
Catch-up DSI for Sub1
184 DSI EXEC Awaiting Command 938(1) HAN_DB.rs_0100006780000628IDS1_2_HAN_
94 DSI Awaiting Message 938 HAN_DB.rs_0100006780000628IDS1_2_HAN_
103 SQM Awaiting Message 938:0 HAN_DB.rs_0100006780000628IDS1_2_HAN_
Catch-up DSI for Sub2
17 RSI Awaiting Wakeup IDS_REP_aseserver1
12 SQM Awaiting Message 16777317:0 IDS_REP_aseserver1
18 RSI Awaiting Wakeup IDS_REP_aseserver2
14 SQM Awaiting Message 16777318:0 IDS_REP_aseserver2
31 RSI USER Active IDS_REP_aseserver1
29 RSI USER Awaiting Command IDS_REP_aseserver2
21 dSUB Sleeping
7 dCM Awaiting Message
9 dAIO Awaiting Message
24 dREC Sleeping dREC
10 dDELSEG Awaiting Message
202 USER Sleeping sa
187 USER Active sa
89 SUB Awaiting Wakeup IDS1_2_HAN_JEST_sub Sub3 waiting (post catch-up??)
6 dALARM Awaiting Wakeup
25 dSYSAM Sleeping

Best practices for direct load ASE  ASE
In all cases
• Make sure sp_dboption ‘bulkcopy’, true & sp_dboption ‘trunc log on checkpoint’, true
• Set target system sp_configure ‘number of locks’ to 5 million or so…
• In SRS, set num_concurrent_subs to 30+ but no higher than 50 or so.
If using allpage locking (APL)
• Set connection max_mat_load_threads to 1
• Load subscriptions from large script….goal is high concurrent bulkloads in multiple tables concurrently
If using datapage locking (DPL)
• Change target table to datarows locking, do subscription via below, change table back to datapage locking
If using datarows locking (DRL)
• Set connection max_mat_load_threads between 3 and 5
• Set num_concurrent_subs to 3 to 5 (this is dynamic)
• Load subscriptions from file….goal is a few large tables with high insert concurrency on each table
Run update index statistics as each batch of subscriptions complete

Other traces to consider (generally)
Trace connection to source connection (needs debug binary/diag server):

• trace 'on', cm, cm_ct_connect
• trace 'on', cm, cm_show_connect_disconnect
SQL Language Commands
• trace 'on','DSI','DSI_CMD_DUMP'
• trace 'on','DSI','DSI_BUF_DUMP'
• trace 'on','ECONN','DSI_BUF_DUMP'
• trace 'on','DSI','DSI_TRACE_WRITETEXT'
Dynamic SQL, Bulk Inserts & HVAR/RTL
• trace 'on','RSFEATURE','RSFEATURE_DSQL'
• trace 'on','RSFEATURE','RSFEATURE_BULK1'
• trace 'on','RSFEATURE','RSFEATURE_HQ1'

Debugging with trace ‘on’,’dsi’,’dsi_buf_dump’
T. 2013/05/10 06:58:51. (182): Command(s) to 'IDS_aseserver1.IDS':

T. 2013/05/10 06:58:51. (182): 'begin transaction '
T. 2013/05/10 06:58:51. (182): 'select count(*) from SAPSR3.VBBS '
T. 2013/05/10 06:58:51. (182): 'select MANDT, MATNR, WERKS, MBDAT, LGORT, CHARG, VBTYP, BDART, PLART, OMENG,
VMENG, MEINS, NODIS, VPZUO, VPMAT, VPWRK, PRBME, UMREF, PZMNG, SOBKZ, KZVBR, SERNR, PLNKZ from SAPSR3.VBBS '
T. 2013/05/10 06:58:51. (182): 'execute rs_marker @rs_api = 'validate subscription 0x010000678000079e''
T. 2013/05/10 06:58:51. (182): 'execute rs_update_lastcommit @origin = 0, @origin_qid =
0x000000000000000000000000000000000000000000000000000000000000000000000000, @secondary_qid =
0x000000000000000000000000000000000000000000000000000000000000000000000000, @origin_time = '19000101
00:00:00:000'[0a] if @@error <> 0 rollback tr'
T. 2013/05/10 06:58:51. (182): 'ansaction[0a] commit transaction'

RS MC counters
counter_id module_name display_name description

72000 DIRMATSEL DIRMATSELFetchTime Amount of time spent by select thread in fetching rows.
Amount of time spent by select thread waiting for Apply thread. Authors Note: This
72002 DIRMATSEL DIRMATSELWaitForApplyTime is likely an indication that max_mat_load_threads is too low or RDB indexing is
slowing down the inserts.
72004 DIRMATSEL DIRMATSELGenerateCommandTime Amount of time spent by select thread generating commands
73000 DIRMATAPP DIRMATAPPTotalTime Amount of time spent in Apply Thread
Amount of time spent by Apply thread waiting for Select thread Authors Note: This
73002 DIRMATAPP DIRMATAPPWaitForSelectTime is likely an indication that max_mat_load_threads is too high – e.g. apply thread is
waiting for rows – or the source system is materializing the data too slow.
73004 DIRMATAPP DIRMATAPPFSMapTime Amount of time spent in function string mapping
73006 DIRMATAPP DIRMATAPPDsiExecSendTime Amount of time spent in Prepare+Bind+Execute

Tips for Direct Load
Configure RRS for a ton of resources

• Increase num_concurrent_subs to 40+ (possibly 50)
• Increase the number of threads by at least num_concurrent_subs*10
• Increase num_stable_queues to (connections x 2 + num_concurrent_subs + 10) or 50 (whichever greater)
• Increase cm_max_connections by ~num_concurrent_subs*5 (e.g. the default is 64 – set to 300)
Make sure PDS is in the interfaces file for the RRS
Remember, the network may be a bottleneck

• Either for the RRS selecting the data out of PDS.PDB or for RRS to send data into the RDS.RDB
Controlling concurrent subscriptions
• In SP100, direct load does NOT block/wait on num_concurrent_subs
 If set to 50, run at most 45 concurrent subscriptions
 Remember, an RCL script containing 10 create subscription commands will be 10 concurrent subscriptions
– Unless there is a pause between them that is long enough to allow previous ones to complete
• In SP102+, direct load does block/wait on num_concurrent_subs

Some thoughts on replicate database tuning
Locking/lock escalations in ASE

• The default mat_load_tran_size is 10,000
• With parallel apply threads, if row locking, you will need 50,000+ locks
 Essentially max_mat_load_threads * mat_load_tran_size ++
• You will also need to set the lock escalation threshold extremely high
 For SAP ECC installs, both the above are handled by the install tool.
Make sure you can handle the number of connections

• Number of DSI’s:
 Normal DSI’s  1 unless using MPR or parallel DSI
 Subscription DSI’s  max_mat_load_threads * num_concurrent_subs
• This could mean over 100 connections from RS into RDB

RAX RepAgent Internals
Heterogeneous Replication Agent Internals & Processing
Heterogeneous Replication Agents
Heterogeneous RepAgents (RAX)

• Called RAX because the X denotes a single code base used for all Hetero RepAgents
 RAO is the Oracle branch, RADB2 is the DB2 UDB branch, etc.
• Written in Java using JavaBeans for portability
• Uses multiple java threads and numerous caches for processing

Core Processing Components
Log Reader
• This is the thread that reads the source database log
• As records are read, they are placed in the scan buffer
Operation Processor
• This thread tracks the transactional context (which records belong to which transaction)
• Determines if operation is to be replicated and which operation from log records
 E.g. if update was logged as a delete/insert pair, the operation processor creates a single update operation
• Operations are stored in the operation queue
Sender Thread
• Copies operations from the operation queue and writes them to the LTL Formatters unformatted queue in the LTI
buffer
• Performs necessary processing for DDL, LOB (text/image) and other special use cases
LTL Formatter
• Converts the unformatted operations into LTL formatted records understood by SRS
RepServer Interface
• Sends the records to Replication Server in batches (for efficiency)

RAX RepAgent internals
LogScanner LTLFormatter
LTLFormatter
LTLFormatter
page 100 ts 1000
next page 101
Scan Buffer
page 101 ts 1001 LogRecords LTI Queue
LogRecords
LogRecords
next page 102
Unformatted
Unformatted Cmds
Cmds
page 102 ts 1002 Unformatted
(aka Change Sets)Cmds
(aka Change Sets)
next page 103 OperationProcessor (aka Change Sets)
LTL
LTL(Formatted) Cmds
page n ts 9999 Operation Queue LTL (Formatted)Cmds
(Formatted) Cmds
next page 100 One for
each TransactionContext
ansactionContext
transaction
TransactionContext
Transaction Log
Operations LTL Buffer
Operations
Operations
Heterogeneous RepServer Interface

RepAgent RASSenderThread

Further Information
SAP Public Web

scn.sap.com
www.sap.com
SAP Education and Certification Opportunities

www.sap.com/education
Watch SAP TechEd Online

www.sapteched.com/online

SAP TechEd Virtual Hands-on Workshops and SAP TechEd Online
Continue your SAP TechEd education after the event!
SAP TechEd Virtual Hands-on Workshops SAP TechEd Online

 Access hands-on workshops post-event  Access replays of keynotes, Demo Jam, SAP TechEd
 Available January – March 2014 LIVE interviews, select lecture sessions, and more!
 Complementary with your SAP TechEd registration  View content only available online
http://saptechedhandson.sap.com/ http://sapteched.com/online

Feedback
Please complete your session evaluation for RDP364.
Thanks for attending this SAP TechEd session.

© 2013 SAP AG or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG.
The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and
SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth
in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and
other countries.
Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices.

SAP Sybase Replication Server Internals and Performance Tuning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SAP Sybase Replication Server Internals and Performance Tuning

Uploaded by

Copyright:

Available Formats

RDP364

SAP Sybase Replication Server Internals and

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 2

SRS Internals & Processing

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 3

Unfortunately, we will be rushing things a tad….(tad = more than “a bit”)

…but 4 hour slots were not allowed for RDP…. 

SRS uses POSIX threading for SMP

Each thread executes one or more SRS code modules

Threads communicate in several ways

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 6

Distributor Stable Queue Transaction

1. RepAgent sends replicated changes to SRS using Log Transfer Language

9. DSI determines how to apply the data (language, bulk, grouping)

10. DSIEXEC sends the replicated commands to the replicate database

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 8

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 9

1. RepAgent sends replicated changes to SRS using Log Transfer Language

3. SQM writes data to the inbound queue

4. WS-DSI reads data from the inbound queue

6. DSIEXEC sends the replicated commands to the replicate database

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 10

OBQ DSI-S DSIEXEC 15

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 11

1. Steps 16 as normal

8. RSI reads data from the route queue

12. For databases local to the RRS, DSI processing as before

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 13

Understand the topology you are troubleshooting

Sanity check the configs – especially memory

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 14

Should already be done before class

Create a new ASE server using 4k page size

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 17

Load the data to be analyzed:

Run the summary report

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 19

RepAgent User does 4 major functions

In a typical SRS, this is a single thread

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 21

1 Parsing (cmds) 3 packing

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 22

Starting tuning considerations for basic operation

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 23

ASE process kernel and deferred asynchronous events

SQM write waits due to slow inbound queue

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 24

NRM (Normalization) thread added as a form of pipeline scalability

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 25

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 26

We now have two intra-thread request buffers

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 27

The wide table problem

SRS 15.7+ adds “Asynchronous Parsing” with ASO option

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 28

Available LTL Buffers

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 29

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 30

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 31

Despite size of schema, most transactions affect <<10% of the tables

Solution: EXEC command cache

© 2013 SAP AG or an SAP affiliate company. All rights reserved. 32

Feature ASE RAX* RS ASO Comments