Copyright (C) 2006 Fidelity National Information Services, Inc.
May 03, 2006
Revision History | |
---|---|
Revision 1.1 | 03 May 2006 |
Revision 1.0 | 06 September 2005 |
GT.M Group Fidelity National Information Services, Inc. 2 West Liberty Boulevard, Suite 300 Malvern, PA 19355, United States of America |
GT.M Support: +1 (610) 578-4226 Switchboard: +1 (610) 296-8877 Fax: +1 (484) 595-5101 http://www.fis-gtm.com gtmsupport@fnf.com |
Table of Contents
On the secondary, it is now possible to start "helper" processes to improve the rate at which the transaction stream from the primary can be processed. mupip replicate –receiver -start now accepts an additional qualifier -he[lpers]=[m[,n]], where m is the total number of helper processes and n is the number of reader helper processes. There are additional parameters in the database file header to tune the performance of the update process and its helpers. DSE can be used to modify these parameters. (D9E10-002497)
GT.M replication can be thought of as a pipeline, where a transaction [1] is committed at the primary, transported over a TCP connection to the secondary, and committed at the secondary. While there is buffering to handle load spikes, the sustained throughput of the pipeline is limited by the capacity of its narrowest stage. Except when the bottleneck is the first stage, there will be a build up of a backlog of transactions within the pipeline. [2] Note also that there is always a bottleneck that limits throughput - if there were no bottleneck, throughput would be infinite.
Since GT.M has no control over the network from the primary to the secondary, it is not discussed here. If the network is the bottleneck, the only solution is to increase its capacity.
Unusual among database engines in not having a daemon, the GT.M database performs best when there are multiple processes accessing the database and cooperating with one another to manage it. When GT.M replication is in use at a logical dual site deployment of an application, the processes at the primary need to execute business logic to compute database updates, whereas the processes at the secondary do not. Thus, if throughput at the primary is limited by the execution of business logic, the primary can be the bottleneck, and there would be no backlog. On the other hand, if the throughput at the primary is limited by the rate at which the database can commit data, it is conceivable that the multiple processes of the primary can outperform a secondary with a solitary update process, thus causing the build-up of a backlog.
To a first approximation, there are two ways that the multiple GT.M processes of a primary that executes business logic can outperform a secondary executing only one GT.M process on identical hardware:
In order to update a database, the database blocks to be updated must first be read from disk, into the operating system buffers and thence into the GT.M global buffer cache. On the primary, the execution of business logic will itself frequently bring the blocks to be updated into the global buffer cache, since the global variables to be updated are likely to be read by the application code before they are updated.
When updating a database, the database blocks and journal generated by one process may well be written to disk by an entirely different process, which better exploits the IO parallelism of most modern operating systems.
For those situations in which the update process on the secondary is a bottleneck, GT.M V5.0-000 implements the concept of helper processes to increase database throughput on the secondary. There can be a maximum of 128 helper processes.
On the secondary, the receive server process communicates with the primary and feeds a stream of update records into the receive pool. The update process reads these update records and applies them to the journal and database files via the journal buffers and global buffer cache in shared memory. Helper processes operate as follows:
Reader helper processes read the update records in the receive pool and attempt to pre-fetch blocks to be updated into the global buffer cache, so that they are available for the update process when it needs them.
Writer helper processes help to exploit the operating system's IO parallelism the way additional GT.M processes do on the primary.
The primary interface for managing helper processes is MUPIP.
The command used to start the receiver server, mupip replicate -receiver -start now takes an additional qualifier, -he[lpers][=m[,n]] to start helper processes.
If the qualifier is not used, or if -helpers=0[,n] is specified, no helper processes are started.
If the qualifier is used, but neither m nor n is specified, the default number of helper processes with the default proportion of roles is started. In V5.0-000, the default number of aggregate helper processes is 8, of which 5 are reader helpers.
If the qualifier is used, and m is specified, but n is not specified, m helper processes are started of which floor(5*m/8) processes are reader helpers.
If both m and n are specified, m helper processes are started of which n are reader helpers. If m<n, mupip starts m readers, effectively reducing n to m.
On UNIX/Linux, helper processes are reported (by the ps command, for example) as mupip replicate -updhelper -reader and mupip replicate -updhelper -writer. On OpenVMS, readers have the prefix GTMUHR and writers have the prefix GTMUHW.
Shutting down the receiver server normally, replicate -receiver -shutdown will also shutdown all helper processes. The command mupip replicate -receiver -shutdown -he[lpers] will shut down only the helper processes leaving the receiver server and update process to continue operating.
Individual helper processes can be shut down with the mupip stop command. Fidelity recommends against this course of action except in the event of some unforseen abnormal event. |
mupip replicate -receiver -checkhealth accepts the optional qualifier -he[lpers]. If -he[lpers] is specified, the status of helper processes is displayed in addition to the status of receiver server and update process.
There are a number of parameters in the database file header that control the behavior of helper processes, and which can be tuned for performance. Although it is believed that the performance of the update process with helper processes is not very sensitive to the values of the parameters over a broad range, each operating environment will be different because the helper processes must strike a balance. For example, if the reader processes are not aggressive enough in bringing database blocks into the global buffer cache, this work will be done by the update process, but if the reader processes are too aggressive, then the cache blocks they use for these database blocks may be overwritten by the update process to commit transactions that are earlier in the update stream. [3]
The DSE dump -fileheader -u[pdproc] command can be used to get a dump of the file header including these helper process parameters, and the DSE change -fileheader command can be used to modify the values of these parameters, e.g.:
scylla ~/demo 5:54pm 1048: dse dump -fileheader -updproc File /xyz/demo/mumps.dat Region DEFAULT File /xyz/demo/mumps.dat Region DEFAULT Date/Time 20-MAY-2005 17:54:37 [$H = 60040,64477] Access method BG Global Buffers 1024 Reserved Bytes 0 Block size (in bytes) 4096 Maximum record size 4080 Starting VBN 129 Maximum key size 255 Total blocks 0x00000065 Null subscripts NEVER Free blocks 0x00000062 Standard Null Collation FALSE Free space 0x00006000 Last Record Backup 0x0000000000000001 Extension Count 100 Last Database Backup 0x0000000000000001 Number of local maps 1 Last Bytestream Backup 0x0000000000000001 Lock space 0x00000028 In critical section 0x00000000 Timers pending 0 Cache freeze id 0x00000000 Flush timer 00:00:01:00 Freeze match 0x00000000 Flush trigger 960 Current transaction 0x0000000000000001 No. of writes/flush 7 Maximum TN 0xFFFFFFFFDFFFFFFF Certified for Upgrade to V5 Maximum TN Warn 0xFFFFFFFF5FFFFFFF Desired DB Format V5 Master Bitmap Size 64 Blocks to Upgrade 0x00000000 Create in progress FALSE Modified cache blocks 0 Reference count 11 Wait Disk 0 Journal State [inactive] ON Journal Before imaging TRUE Journal Allocation 100 Journal Extension 100 Journal Buffer Size 128 Journal Alignsize 128 Journal AutoSwitchLimit 8388600 Journal Epoch Interval 300 Journal Yield Limit 8 Journal Sync IO FALSE Journal File: /xyz/demo/mumps.mjl Mutex Hard Spin Count 128 Mutex Sleep Spin Count 128 Mutex Spin Sleep Time 2048 KILLs in progress 0 Replication State ON Region Seqno 0x0000000000000001 Resync Seqno 0x0000000000000001 Resync trans 0x0000000000000001 Upd reserved area [% global buffers] 50 Avg blks read per 100 records 200 Pre read trigger factor [% upd rsrvd] 50 Upd writer trigger [%flshTrgr] 33
The records in the update stream received from the primary describe logical updates to global variables. Each update will involve reading one or more database blocks. Avg blks read per 100 records is an estimate of the number of database blocks that will be read for 100 update records. A good value to use is the average height of the tree on disk for a global variable. In V5.0-000, the default value is 200, which would be a good approximation for a small global variable (one index block plus one data block). For very large databases, the value could be increased up to 400.
The DSE command change -fileheader -avg_blks_read=n sets the value of Avg blks read per 100 Records to n for the current region.
When so requested by the update process, reader helpers will read global variables referenced by records from the receive pool. The number of records read from the receive pool will be:
(100-upd_reserved_area)*No_of_global_buffers/avg_blks_read
In other words, this field an approximate percentage (integer value 0 to 100) of the number of global buffers reserved for the update process to use, and the reader helper processes will leave at least this percentage of the global buffers for the update process to use. In V5.0-000, the default value is 50, i.e., 50% global buffers are reserved for update process and up to 50% will be filled by reader helper processes.
The DSE command change -fileheader -upd_reserved_area=n sets the value of Upd reserved area to n for the current region.
When the reader helpers have read the number of update records from the receive pool, they will suspend their reading. Whenever the update process processes Pre read trigger factor percentage of Upd reserved area, it will signal the reader helper processes to resume processing journal records and reading global variables into the global buffer cache. In V5.0-000, the default value is 50, i.e., when 50% of the upd reserved area global buffers are processed by update process, it triggers the reader helpers to resume, in case they were idle. The number of records read by update process to signal reader helpers to resume reading will be:
upd_reserved_area*pre_read_trigger_factor*No_of_global_buffers/(avg_blks_read*100)
The DSE command change -file_header -pre_read_trigger_factor=n sets the value of Pre read trigger factor to n for the current region.
One of the parameters used by GT.M to manage the database is the flush trigger. One of several conditions that triggers that causes normal GT.M processes to initiate flushing dirty buffers from the database global buffer cache is when the number of dirty buffers crosses the flush trigger. GT.M processes dynamically tune this value in normal use. In an attempt to never require the update process itself to flush dirty buffers, when the number of dirty global buffers crosses upd writer trigger factor of the flush trigger, writer helper processes start flushing dirty buffers to disk. In V5.0-000, the default value is 33, i.e., 33%.
The DSE command change -file_header -upd_writer_trigger_factor=n sets the value of Upd writer trigger factor to n for the current region.
Command Syntax: UNIX syntax (i.e., lowercase text and "-" for flags/qualifiers) is used throughout this document. VMS accepts both lowercase and uppercase text; flags/qualifiers should be preceded with "/".
The value of /helper must be a quoted string as in /helper="m,n" when both m and n are specified on OpenVMS. |
Reference Number: The reference numbers used to track software enhancements and customer support requests appear in parentheses ( ).
Platform Identifier: If a new feature or software enhancement does not apply to all platforms, the relevant platform appears in brackets [ ].
[1] Sans the use of GT.M transaction processing (TStart/TCommit), each individual update can be considered to be a miniature transaction.
[2] The design of GT.M replication is such that the primary will never slow down, and the backlog at the primary is limited entirely by the disk space available for journal files.
[3] At least on UNIX/Linux, as long as the system is not constrained in the amount of available RAM, it is probably better to err, if one must, on the side of being aggressive, since the blocks will be in the operating system's unified buffer cache.