Procedures

Download Replication Examples

repl_procedures.tar.gz contains a set of replication example scripts. Each script contains a combination of GT.M commands that accomplish a specific task. All examples in the Procedures section use these replication scripts but each example uses different script sequence and script arguments. Always run all replication examples in a test system from a new directory as they create sub-directories and database files in the current directory. No claim of copyright is made with regard to these examples. These example scripts are for explanatory purposes and are not intended for production use. YOU MUST UNDERSTAND, AND APPROPRIATELY ADJUST THE COMMANDS GIVEN IN THESE SCRIPTS BEFORE USING IN A PRODUCTION ENVIRONMENT. Typically, you would set replication between instances on different systems/data centers and create your own set of scripts with appropriate debugging and error handling to manage replication between them. Click Download repl_procedures.tar.gz to download repl_procedures.tar.gz on a test system.

repl_procedures.tar.gz includes the following scripts:

env

Sets a default environment for GT.M replication. It take two arguments:

  • The name of the instance/database directory.
  • The GT.M version.

Example: source ./env A V6.3-000A_x86_64

Here is the code:

export gtm_dist=/usr/lib/fis-gtm/$2
export gtm_repl_instname=$1
export gtm_repl_instance=$PWD/$gtm_repl_instname/gtm.repl
export gtmgbldir=$PWD/$gtm_repl_instname/gtm.gld
export gtm_principal_editing=EDITING
export gtmroutines="$PWD/$gtm_repl_instname $gtm_dist"
#export gtmroutines="$PWD/$gtm_repl_instname $gtm_dist/libgtmutil.so"
# Here is an example of setting the gtmroutines environment variable:
# if [ -e  "$gtm_dist/libgtmutil.so" ] ; then export gtmroutines="$PWD/$gtm_repl_instname $gtm_dist/libgtmutil.so"
else export gtmroutines="$PWD/$gtm_repl_instname* $gtm_dist" ; fi
# For more examples on setting GT.M related environment variables to reasonable values on POSIX shells, refer to the gtmprofile script.
#export gtmcrypt_config=$PWD/$gtm_repl_instname/config_file
#echo -n "Enter Password for gtmtls_passwd_${gtm_repl_instname}: ";export gtmtls_passwd_${gtm_repl_instname}="`$gtm_dist/plugin/gtmcrypt/maskpass|tail -n 1|cut -f 3 -d " "`"

Modify the env script according to your test environment.

db_create

Creates a new sub-directory in the current directory, a global directory file with settings from gdemsr, and the GT.M database file.

Here is the code:

mkdir -p $PWD/$gtm_repl_instname/
  $gtm_dist/mumps -r ^GDE @gdemsr
  $gtm_dist/mupip create

gdemsr contains:

change -segment DEFAULT -file_name=$PWD/$gtm_repl_instname/gtm.dat
  exit

backup_repl

Creates a backup of the replication instance file. The first argument specifies the location of the backed up replication instance file.

Here is the code:

$gtm_dist/mupip backup -replinst=$1

repl_setup

Turns on replication for all regions and create the replication instance file with the -noreplace qualifier for a BC instance.

Here is the code:

$gtm_dist/mupip set -replication=on -region "*"
  $gtm_dist/mupip replicate -instance_create -noreplace

originating_start

Starts the Source Server of the originating instance in a BC replication configuration. It takes five arguments:

  • The first argument is the name of the originating instance. This argument is also used in the name of the Source Server log file.
  • The second argument is the name of the BC replicating instance. This argument is also used in the name of the Source Server log file.
  • The third argument is port number of localhost at which the Receiver Server is waiting for a connection.
  • The optional fourth and fifth argument specify the -tlsid and -reneg qualifiers used to set up a TLS/SSL connection.
  • Example:./originating_start A B 4001

Here is the code:

$gtm_dist/mupip replicate -source -start -instsecondary=$2 -secondary=localhost:$3 -buffsize=1048576 -log=$PWD/$1/$1_$2.log $4 $5
tail -30 $PWD/$1/$1_$2.log
$gtm_dist/mupip replicate -source -checkhealth

replicating_start

Starts the passive Source Server and the Receiver Server in a BC replication configuration. It takes four arguments:

  • The first argument is the name of the replicating instance. This argument is also used in the name of the passive Source Server and Receiver Server log file name.
  • The second argument is port number of localhost at which the Source Server is sending the replication stream for the replicating instance.
  • The optional third and fourth arguments are used to specify additional qualifiers for the Receiver Server startup command.
  • Example:./replicating_start B 4001

Here is the code:

$gtm_dist/mupip replicate -source -start -passive -instsecondary=dummy -buffsize=1048576 -log=$PWD/$1/source$1_dummy.log # creates the Journal Pool
$gtm_dist/mupip replicate -receive -start -listenport=$2 -buffsize=1048576 -log=$PWD/$1/receive.log $3 $4 # starts the Receiver Server
tail -20 $PWD/$1/receive.log
$gtm_dist/mupip replicate -receive -checkhealth

suppl_setup

Turns on replication for all regions, create the supplementary replication instance file with the -noreplace qualifier, starts the passive Source Server, starts the Receiver Server of an SI replicating instance, and displays the health status of the Receiver Server and Update Process. Use this to start an SI replicating instance for the first time. It takes four arguments:

  • The first argument is the name of the supplementary instance. This argument is also used in the name of the passive Source Server and Receiver Server log files.
  • The second argument is the path to the backed up replication instance file of the originating instance.
  • The third argument is port number of localhost at which the Receiver Server is waiting for a connection.
  • The optional fourth argument is either -updok or -updnotok which determines whether the instance accepts updates.
  • The optional fifth argument specifies -tlsid which is used in setting up a TLS/SSL replication connection.

Example: ./suppl_setup P startA 4011 -updok

Here is the code:

$gtm_dist/mupip set -replication=on -region "*"
$gtm_dist/mupip replicate -instance_create -supplementary -noreplace
$gtm_dist/mupip replicate -source -start -passive -buf=1048576 -log=$PWD/$gtm_repl_instname/$1_dummy.log -instsecondary=dummy $4
$gtm_dist/mupip replicate -receive -start -listenport=$3 -buffsize=1048576 -log=$PWD/$gtm_repl_instname/$1.log -updateresync=$2 -initialize $5
tail -30 $PWD/$1/$1.log
$gtm_dist/mupip replicate -receive -checkhealth

repl_status

Displays health and backlog status information for the Source Server and Receiver Server in the current environment.

Here is the code:

echo "-----------------------------------------------------------------"
  echo "Source Server $gtm_repl_instname: "
  echo "-----------------------------------------------------------------"
  $gtm_dist/mupip replicate -source -check
  $gtm_dist/mupip replicate -source -showbacklog
  echo "-----------------------------------------------------------------"
  echo "Receiver Server $gtm_repl_instname: "
  echo "-----------------------------------------------------------------"
  $gtm_dist/mupip replicate -receive -check
  $gtm_dist/mupip replicate -rece -showbacklog

rollback

Performs an ONLINE FETCHRESYNC rollback and creates a lost and/or broken transaction file. It takes two arguments:

  • The first argument specifies the communication port number that the rollback command uses when fetching the reference point. This is the same port number that the Receiver Server used to communicate with the Source Server.
  • The second argument specifies either backward or forward.

Example: ./rollback 4001 backward

Here is the code:

$gtm_dist/mupip journal -rollback -fetchresync=$1 -$2 "*"

originating_stop

Shuts down the Source Server with a two second timeout and perform a MUPIP RUNDOWN operation.

The first argument specifies additional qualifiers for the Source Server shutdown command.

Here is the code:

$gtm_dist/mupip replicate -source -shutdown -timeout=2 $1 #Shut down the originating Source Server
  $gtm_dist/mupip rundown -region "*" #Perform database rundown

replicating_stop

Shuts down the Receiver Server with a two seconds timeout and then shuts down the passive Source Server.

Here is the code:

$gtm_dist/mupip replicate -receiver -shutdown -timeout=2 #Shut down the Receiver Server
  $gtm_dist/mupip replicate -source -shutdown -timeout=2 #Shut down the passive Source Server

replicating_start_suppl_n

Starts the passive Source Server and the Receiver Server of the supplementary instance for all startups except the first. It takes three arguments:

  • The first argument is the name of the supplementary instance. It is also used in the names of the passive Source Server and the Receiver Server log files.
  • The second argument is port number of localhost at which the Receiver Server is waiting for a connection.
  • The optional third argument is an additional qualifier for the passive Source Server startup command. In the examples, the third argument is either -updok or -updnotok.
  • The optional fourth argument is an additional qualifier for the Receiver Server startup command. In the examples, the fourth argument is either -autorollback or -noresync.
  • The optional fifth argument is -tlsid which is used to set up a TLS/SSL replication connection.

Example:./replicating_start_suppl_n P 4011 -updok -noresync

Here is the code:

$gtm_dist/mupip replicate -source -start -passive -instsecondary=dummy -buffsize=1048576 -log=$PWD/$gtm_repl_instname/$12dummy.log $3 # creates the Journal Pool
  $gtm_dist/mupip replicate -receive -start -listenport=$2 -buffsize=1048576 $4 $5 -log=$PWD/$gtm_repl_instname/$1.log # starts the Receiver Server and the Update Process
  tail -30 $PWD/$1/$1.log
  $gtm_dist/mupip replicate -receiver -checkhealth # Checks the health of the Receiver Server and the Update Process

Setting up an A→B replication configuration with empty databases

On A:

  • Turn on replication.

  • Create the replication instance file.

  • Start the Source Server.

On B:

  • Turn on replication.

  • Create a new replication instance file.

  • Start the passive Source Server.

  • Start the Receiver Server.

Example:

source ./gtmenv A V6.3-000A_x86_64 
./db_create
./repl_setup
./originating_start A B 4001
source ./gtmenv B V6.3-000A_x86_64
./db_create
./repl_setup
./replicating_start B 4001
./repl_status

The shutdown sequence is as follows:

source ./gtmenv B V6.3-000A_x86_64
./replicating_stop
source ./env A V6.3-000A_x86_64
./originating_stop

Setting up an A→B→C replication configuration with empty databases

On A:

  • Turn on replication.

  • Create the replication instance file.

  • Start the Source Server.

On B:

  • Turn on replication.

  • Create a new replication instance file.

  • Start the passive Source Server.

  • Start the Receiver Server.

  • Start the Source Server with the -propagateprimary qualifier.

On C:

  • Turn on replication.

  • Create a new replication instance file.

  • Start the passive Source Server .

  • Start the Receiver Server.

Example:

source ./gtmenv A V6.3-000A_x86_64 
./db_create
./repl_setup
./originating_start A B 4001
source ./gtmenv B V6.3-000A_x86_64
./db_create
./repl_setup
./replicating_start B 4001
./originating_start B C 4002 -propagateprimary
source ./gtmenv C V6.3-000A_x86_64
./db_create
./repl_setup
./replicating_start C 4002
./repl_status

The shutdown sequence is as follows:

source ./gtmenv C V6.3-000A_x86_64
./replicating_stop
source ./gtmenv B V6.3-000A_x86_64
./replicating_stop
./originating_stop
source ./gtmenv A V6.3-000A_x86_64
./originating_stop

Setting up an A→P replication configuration with empty databases

On A:

  • Turn on replication.

  • Create a new replication instance file.

  • Start the Source Server.

  • Immediately, after starting the Source Server but before making any updates, take a backup of the replication instance file. The instance file has the journal sequence number that corresponds to the database state at the time of starting the instance. This backup instance helps when you need to start a new Supplementary Instance without taking a backup of the Originating Instance. Retain the backup copy of the Originating Instance as you may need it in future as a checkpoint from where its supplementary instance may resume to receive updates.

[Note] Note

Note that while a backed up instance file helps start replication on the P side of A→P, it does not prevent the need for taking a backup of the database on A. You need to do a database backup/restore or an extract/load from A to P to ensure P has all of the data as on A at startup.

On P:

  • Turn on replication.

  • Create a new replication instance file with the -supplementary qualifier.

  • Start the passive Source Server.

  • Start the Receiver Server and the Update Process with -updateresync="/path/to/bkup_orig_repl_inst_file" -initialize. Use the -updateresync -initialize qualifiers only once.

  • For subsequent Receiver Server and Update Process start ups, do not use -updateresync -initialize with qualifiers. Either use -autorollback with the Receiver Server startup command or perform an explicit -fetchresync -rollback before starting the Receiver Server.

Example:

source ./gtmenv A V6.3-000A_x86_64
./db_create
./repl_setup
./originating_start A P 4000
./backup_repl startA
source ./gtmenv P V6.3-000A_x86_64
./db_create
./suppl_setup P startA 4000 -updok
./repl_status
# For subsequent Receiver Server startup for P, use:
# ./replicating_start_suppl_n P 4000 -updok -autorollback
# or 
#./rollback 4000 backward
#./replicating_start_suppl_n P 4000 -updok

The shutdown sequence is as follows:

source ./gtmenv P V6.3-000A_x86_64
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
./originating_stop

Replicating Instance Starts from Backup of Originating Instance (A→B and A→P )

The more common scenario for bringing up a replicating instance is to take a backup of the originating instance and bring it up as a replicating instance. If the backup is a comprehensive backup, the file headers store the journal sequence numbers.

On A:

  • Create a backup using -DATABASE, -REPLINST, -NEWJNLFILE=PREVLINK, and -BKUPDBJNL=DISABLE qualifiers. -DATABASE creates a comprehensive backup of the database file. -REPLINST backs up the replication instance file. -BKUPDBJNL=DISABLE scrubs all journal file information in the backup database file. As the backup of instance A is comprehensive, -NEWJNLFILE=PREVLINK cuts the back link to prior generation journal files of the database for which you are taking the backup.

  • Copy the backup of the replication instance file to the location of the backed up instance.

  • Start a new Source Server for the backed up replicating instance.

On the backed up instance:

  • Load/restore the database. If the replicating database is not from a comprehensive or database backup from the originating instance, set the journal sequence number from the originating at the instant of the backup for at least one replicated region on the replicating instance.

  • Run MUPIP REPLICATE -EDITINSTANCE command to change the name of the backed up replication instance file.

  • Start the Receiver Server for the BC replicating instance. Do not use the -UPDATERESYNC qualifier to start the receiver server of a BC replicating instance. -UPDATERESYNC is necessary when you start the Receiver Server of a SI replicating instance for the first time. Without -UDPATERESYNC, the SI replicating instance may refuse to start replication because the journal sequence number in the replicating instance may be higher than what the originating instance expects.

Example:

The following example demonstrates starting a replicating instance from the backup of an originating instance in an A→B replication configuration. Note that you do not need to perform an -updateresync when starting a BC replicating instance for the first time.

source ./gtmenv A V6.3-000A_x86_64
./db_create
./repl_setup
./originating_start A backupA 4001
./backup_repl startingA   #Preserve the backup of the replicating instance file that represents the state at the time of starting the instance. 
$gtm_dist/mumps -r %XCMD 'for i=1:1:10 set ^A(i)=i'
mkdir backupA   
$gtm_dist/mupip backup -replinst=currentstateA -newjnlfile=prevlink -bkupdbjnl=disable DEFAULT backupA
source ./gtmenv backupA V6.3-000A_x86_64
./db_create
./repl_setup
cp currentstateA backupA/gtm.repl
$gtm_dist/mupip replicate -editinstance -name=backupA backupA/gtm.repl 
./replicating_start backupA 4001 
./repl_status

The shutdown sequence is as follows:

source ./gtmenv backupA V6.3-000A_x86_64
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
./originating_stop

The following example demonstrates starting a replicating instance from the backup of an originating instance in an A→P replication configuration. Note that you need to perform an -updateresync to start a supplementary instance for the first time.

source ./gtmenv A V6.3-000A_x86_64
./db_create
./repl_setup
./originating_start A backupA 4011
./backup_repl startingA   
$gtm_dist/mumps -r %XCMD 'for i=1:1:10 set ^A(i)=i'
./backup_repl currentstateA
mkdir backupA
$gtm_dist/mupip backup -newjnlfile=prevlink -bkupdbjnl=disable DEFAULT backupA
source ./gtmenv backupA V6.3-000A_x86_64
./db_create
./suppl_setup backupA currentstateA 4011 -updok
./repl_status

The shutdown sequence is as follows:

source ./gtmenv backupA V6.3-000A_x86_64
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
./originating_stop

Switchover possibilities in an A→B replication configuration

A switchover is the procedure of switching the roles of an originating instance and a replicating instance. A switchover is necessary for various reasons including (but not limited to) testing the replicating instance preparedness to take over the role of an originating instance or bringing the originating instance down for maintenance in a way that there is minimal impact on application availability.

In an A->B replication configuration, at any given point there can be two possibilities:

  • A is ahead of B, that is A has updates which are not yet replicated to B.

  • A and B are in sync. This happens where there are no new updates on A and all pending updates are replicated to B.

The steps described in this section perform a switchover (A->B becomes B->A) under both these possibilities. When A is ahead of B, these steps generate a lost transaction file which must be applied to the new originating instance as soon as possible. The lost transaction file contains transactions which are were not replicated to B. Apply the lost transactions on the new originating instance either manually or in a semi-automated fashion using the M-intrinsic function $ZQGBLMOD(). If you use $ZQGBLMOD(), perform two additional steps (mupip replicate -source -needrestart and mupip replicate -source -losttncomplete) as part of lost transaction processing. Failure to run these steps can cause $ZQGBLMOD() to return false negatives that in turn can result in application data consistency issues.

First, choose a time when there are no database updates or the rate of updates are low to minimize the chances that your application may time out. There may be a need to hold database updates briefly during the switchover. For more information on holding database updates, refer to Instance Freeze section to configure an appropriate freezing mechanism suitable for your environment.

In an A→B replication configuration, follow these steps:

On A:

  • Shut down the Source Server with an appropriate timeout. The timeout should be long enough to replicate pending transactions to the replicating instance but not too long to cause clients to conclude that the application is not available. The GT.M default Source Server wait period is 30 seconds.

On B:

  1. Shut down the Receiver Server and the Update Process.

  2. Shut down the passive Source Server to bring down the journal pool. Ensure that you shut down the Receiver Server and Update Process first before shutting down the passive Source Server.

  3. Start B as the new originating instance.

On A:

  1. Start the passive Source Server.

  2. Perform a FETCHRESYNC ROLLBACK BACKWARD.

  3. Start the Receiver Server.

  4. Process the lost transaction file as soon as possible.

The following example runs a switchover in an A→B replication configuration.

source ./gtmenv A V6.3-000A_x86_64 # creates a simple environment for instance A
./db_create
./repl_setup # enables replication and creates the replication instance file
./originating_start A B 4001 # starts the active Source Server (A->B)
$gtm_dist/mumps -r %XCMD 'for i=1:1:100 set ^A(i)=i'
./repl_status #-SHOWBACKLOG and -CHECKHEALTH report
source ./gtmenv B V6.3-000A_x86_64 # creates a simple environment for instance B
./db_create
./repl_setup
./replicating_start B 4001 
./repl_status # -SHOWBACKLOG and -CHECKHEATH report 
./replicating_stop # Shutdown the Receiver Server and the Update Process 
source ./gtmenv A V6.3-000A_x86_64 # Creates an environment for A
$gtm_dist/mumps -r %XCMD 'for i=1:1:50 set ^losttrans(i)=i' # perform some updates when replicating instance is not available. 
sleep 2
./originating_stop # Stops the active Source Server 
source ./gtmenv B V6.3-000A_x86_64 # Create an environment for B
./originating_start B A 4001 # Start the active Source Server (B->A)
source ./gtmenv A V6.3-000A_x86_64 # Create an environment for A
./rollback 4001 backward
./replicating_start A 4001 # Start the replication Source Server 
./repl_status # To confirm whether the Receiver Server and the Update Process started correctly.
cat A/gtm.lost 

The shutdown sequence is as follows:

source ./gtmenv A V6.3-000A_x86_64
./replicating_stop
source ./gtmenv B V6.3-000A_x86_64
./originating_stop

Switchover possibilities in a B←A→P replication configuration

A requires rollback

The following scenario demonstrates a switchover from B←A→P to A←B→P when A has unreplicated updates that require rollback before B can become the new originating instance.

A

B

P

Comments

O: ... A95, A96, A97, A98, A99

R: ... A95, A96, A97, A98

S: ... P34, A95, P35, P36, A96, A97, P37, P38

A as an originating primary instance at transaction number A99, replicates to B as a BC replicating secondary instance at transaction number A98 and P as a SI that includes transaction number A97, interspersed with locally generated updates. Updates are recorded in each instance's journal files using before-image journaling.

Crashes

O: ... A95, A96, A97, A98, B61

... P34, A95, P35, P36, A96, A97, P37, P38

When an event disables A, B becomes the new originating primary, with A98 as the latest transaction in its database, and starts processing application logic to maintain business continuity. In this case where P is not ahead of B, the Receiver Server at P can remain up after A crashes. When B connects, its Source Server and P"s Receiver Server confirms that B is not behind P with respect to updates received from A, and SI replication from B picks up where replication from A left off.

-

O: ... A95, A96, A97, A98, B61, B62

S: ... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, B61, P40

P operating as a supplementary instance to B replicates transactions processed on B, and also applies its own locally generated updates. Although A98 was originally generated on A, P received it from B because A97 was the common point between B and P.

... A95, A96, A97, A98, A99

O: ... A95, A96, A97, A98, B61, B62, B63, B64

S: ... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, B61, P40, B62, B63

P, continuing as a supplementary instance to B, replicates transactions processed on B, and also applies its own locally generated updates. A meanwhile has been repaired and brought online. It has to roll transaction A99 off its database into an Unreplicated Transaction Log before it can start operating as a replicating secondary instance to B.

R: ... A95, A96, A97, A98, B61, B62, B63, B64

O: ... A95, A96, A97, A98, B61, B62, B63, B64, B65

S: ... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, B61, P40, B62, B63, P41, B64

Having rolled off transactions into an Unreplicated Transaction Log, A can now operate as a replicating secondary instance to B. This is normal BC Logical Multi-Site operation. B and P continue operating as originating primary instance and supplementary instance.

The following example creates this switchover scenario:

source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^A(98)=99'
source ./gtmenv B V6.3-000A_x86_64
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^A(99)=100'
./originating_stop
source ./gtmenv B V6.3-000A_x86_64
./originating_start B A 4010
./originating_start B P 4011
./backup_repl startB
$gtm_dist/mumps -r ^%XCMD 'set ^B(61)=0'
source ./gtmenv P V6.3-000A_x86_64
./suppl_setup M startB 4011 -updok
$gtm_dist/mumps -r ^%XCMD 'for i=39:1:40 set ^P(i)=i'
source ./gtmenv B V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^B(62)=1,^B(63)=1'
source ./gtmenv A V6.3-000A_x86_64
./rollback 4010 backward
./replicating_start A 4010
source ./gtmenv B V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^B(64)=1,^B(65)=1'
cat A/gtm.lost

The shutdown sequence is as follows:

source ./gtmenv B V6.3-000A_x86_64
./originating_stop
source ./gtmenv A V6.3-000A_x86_64 
./replicating_stop
source ./gtmenv P V6.3-000A_x86_64
./replicating_stop

A and P require rollback

The following demonstrates a switchover scenario from B←A→P to A←B→P where A and P have unreplicated updates that require rollback before B can become the new originating instance.

A

B

P

Comments

O: ... A95, A96, A97, A98, A99

R: ... A95, A96, A97

S: ... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, P40

A as an originating primary instance at transaction number A99, replicates to B as a BC replicating secondary instance at transaction number A97 and P as a SI that includes transaction number A98, interspersed with locally generated updates. Updates are recorded in each instance's journal files using before-image journaling.

Crashes

O: ... A95, A96, A97

... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, P40

When an event disables A, B becomes the new originating primary, with A97 the latest transaction in its database. P cannot immediately start replicating from B because the database states would not be consistent - while B does not have A98 in its database and its next update may implicitly or explicitly depend on that absence, P does, and may have relied on A98 to compute P39 and P40.

-

O: ... A95, A96, A97, B61, B62

S: ... P34, A95, P35, P36, A96, A97, P37, P38, B61

For P to accept replication from B, it must roll off transactions generated by A, (in this case A98) that B does not have in its database, as well as any additional transactions generated and applied locally since transaction number A98 from A.[a] This rollback is accomplished with a MUPIP JOURNAL -ROLLBACK -FETCHRESYNC operation on P.[b] These rolled off transactions (A98, P39, P40) go into the Unreplicated Transaction Log and can be subsequently reprocessed by application code.[c] Once the rollback is completed, P can start accepting replication from B.[d] B in its Originating Primary role processes transactions and provides business continuity, resulting in transactions B61 and B62.

-

O: ... A95, A96, A97, B61, B62, B63, B64

S: ... P34, A95, P35, P36, A96, A97, P37, P38, B61, B62, P39a, P40a, B63

P operating as a supplementary instance to B replicates transactions processed on B, and also applies its own locally generated updates. Note that P39a & P40a may or may not be the same updates as the P39 & P40 previously rolled off the database.

[a] As this rollback is more complex, may involve more data than the regular LMS rollback, and may involve reading journal records sequentially; it may take longer.

[b] In scripting for automating operations, there is no need to explicitly test whether B is behind P - if it is behind, the Source Server will fail to connect and report an error, which automated shell scripting can detect and effect a rollback on P followed by a reconnection attempt by B. On the other hand, there is no harm in P routinely performing a rollback before having B connect - if it is not ahead, the rollback will be a no-op. This characteristic of replication is unchanged from releases prior to V5.5-000.

[c] GT.M's responsibility for them ends once it places them in the Unreplicated Transaction Log.

[d] Ultimately, business logic must determine whether the rolled off transactions can simply be reapplied or whether other reprocessing is required. GT.M's $ZQGBLMOD() function can assist application code in determining whether conflicting updates may have occurred.

The following example creates this scenario.

source ./gtmenv A V6.3-000A_x86_64
./db_create
./repl_setup
./originating_start A B 4010
./originating_start A P 4011
./backup_repl startA
$gtm_dist/mumps -r ^%XCMD 'for i=1:1:97 set ^A(i)=i'
source ./gtmenv B V6.3-000A_x86_64
./db_create
./repl_setup
./replicating_start B 4010
source ./gtmenv P V6.3-000A_x86_64
./db_create
./suppl_setup P startA 4011 -updok
$gtm_dist/mumps -r ^%XCMD 'for i=1:1:40 set ^P(i)=i'
source ./gtmenv B V6.3-000A_x86_64 
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^A(98)=99'
source ./gtmenv P V6.3-000A_x86_64
./replicating_stop 
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^A(99)=100'
./originating_stop
source ./gtmenv B V6.3-000A_x86_64
./originating_start B A 4010
./originating_start B P 4011
./backup_repl startB
$gtm_dist/mumps -r ^%XCMD 'set ^B(61)=0,^B(62)=1'
source ./gtmenv P V6.3-000A_x86_64
./rollback 4011 backward
./suppl_setup P startB 4011 -updok
$gtm_dist/mumps -r ^%XCMD 'for i=39:1:40 set ^P(i)=i'
source ./gtmenv A V6.3-000A_x86_64
./rollback 4010 backward
./replicating_start A 4010
source ./gtmenv B V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^B(64)=1,^B(65)=1'
cat A/gtm.lost
cat P/gtm.lost

The shutdown sequence is as follows:

source ./gtmenv B V6.3-000A_x86_64
./originating_stop
source ./gtmenv A V6.3-000A_x86_64 
./replicating_stop
source ./gtmenv P V6.3-000A_x86_64
./replicating_stop

Rollback not required by application design

The following scenario demonstrates a switchover from B←A→P to A←B→P when A and P have unreplicated updates. By application design, unreplicated updates on P do not require rollback when B becomes the new originating instance.

A

B

P

Comments

O: ... A95, A96, A97, A98, A99

R: ... A95, A96, A97

S: ... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, P40

A as an originating primary instance at transaction number A99, replicates to B as a BC replicating secondary instance at transaction number A97 and P as a SI that includes transaction number A98, interspersed with locally generated updates. Updates are recorded in each instance's journal files using before-image journaling.

Crashes

O: ... A95, A96, A97, B61, B62

... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, P40

When an event disables A, B becomes the new originating primary, with A97 the latest transaction in its database and starts processing application logic. Unlike the previous example, in this case, application design permits (or requires) P to start replicating from B even though B does not have A98 in its database and P may have relied on A98 to compute P39 and P40.

-

O: ... A95, A96, A97, B61, B62

S: ... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, P40, B61, B62

With its Receiver Server started with the -noresync option, P can receive a SI replication stream from B, and replication starts from the last common transaction shared by B and P. Notice that on B no A98 precedes B61, whereas it does on P, i.e., P was ahead of B with respect to the updates generated by A.

The following example creates this scenario.

source ./gtmenv A V6.3-000A_x86_64
./db_create
./repl_setup
./originating_start A B 4010
./originating_start A P 4011
./backup_repl startA
$gtm_dist/mumps -r ^%XCMD 'for i=1:1:97 set ^A(i)=i'
source ./gtmenv B V6.3-000A_x86_64
./db_create
./repl_setup
./replicating_start B 4010 
source ./gtmenv P V6.3-000A_x86_64
./db_create
./suppl_setup P startA 4011 -updok
$gtm_dist/mumps -r ^%XCMD 'for i=1:1:40 set ^P(i)=i'
source ./gtmenv B V6.3-000A_x86_64 
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^A(98)=99'
source ./gtmenv P V6.3-000A_x86_64
./replicating_stop 
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^A(99)=100'
./originating_stop
source ./gtmenv B V6.3-000A_x86_64
./originating_start B A 4010
./originating_start B P 4011
#./backup_repl startB
$gtm_dist/mumps -r ^%XCMD 'set ^B(61)=0,^B(62)=1'
source ./gtmenv P V6.3-000A_x86_64
./replicating_start_suppl_n P 4011 -updok -noresync
$gtm_dist/mumps -r ^%XCMD 'for i=39:1:40 set ^P(i)=i'
source ./gtmenv A V6.3-000A_x86_64
./rollback 4010 backward
./replicating_start A 4010 
source ./gtmenv B V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^B(64)=1,^B(65)=1'

The shutdown sequence is as follows:

source ./gtmenv B V6.3-000A_x86_64
./originating_stop
source ./gtmenv A V6.3-000A_x86_64
./replicating_stop
source ./gtmenv P V6.3-000A_x86_64
./replicating_stop

Rollback automatically

This scenario demonstrates the use of the -autorollback qualifier which performs a ROLLBACK ONLINE FETCHRESYNC under the covers.

A

B

P

Comments

O: ... A95, A96, A97, A98, A99

R: ... A95, A96, A97

S: ... P34, A95, P35, P36, A96, A97, P37, P38, A98, P39, P40

A as an originating primary instance at transaction number A99, replicates to B as a BC replicating secondary instance at transaction number A97 and P as a SI that includes transaction number A98, interspersed with locally generated updates. Updates are recorded in each instance"s journal files using before-image journaling.

R: Rolls back to A97 with A98 and A99 in the Unreplicated Transaction Log.

O: A95, A96, A97

S:Rolls back A98, P38, and P40

Instances receiving a replication stream from A can be configured to rollback automatically when A performs an online rollback by starting the Receiver Server with -autorollback. If P"s Receiver Server is so configured, it will roll A98, P39 and P40 into an Unreplicated Transaction Log. This scenario is straightforward. With the -noresync qualifier, the Receiver Server can be configured to simply resume replication without rolling back.

The following example run this scenario.

source ./gtmenv A V6.3-000A_x86_64
./db_create
./repl_setup
./originating_start A P 4000
./originating_start A B 4001
source ./gtmenv B V6.3-000A_x86_64
./db_create
./repl_setup
./replicating_start B 4001
source ./gtmenv A V6.3-000A_x86_64
./backup_repl startA 
source ./gtmenv P V6.3-000A_x86_64
./db_create
./suppl_setup P startA 4000 -updok
$gtm_dist/mumps -r %XCMD 'for i=1:1:38 set ^P(i)=i'
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r %XCMD 'for i=1:1:97 set ^A(i)=i'
source ./gtmenv B V6.3-000A_x86_64
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r %XCMD 'set ^A(98)=50'
source ./gtmenv P V6.3-000A_x86_64
$gtm_dist/mumps -r %XCMD 'for i=39:1:40 set ^P(i)=i'
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r %XCMD 'set ^A(99)=100'
./originating_stop
source ./gtmenv B V6.3-000A_x86_64
./originating_start B A 4001 
./originating_start B P 4000
source ./gtmenv A V6.3-000A_x86_64
./replicating_start A 4001 -autorollback
source ./gtmenv P V6.3-000A_x86_64
#./rollback 4000 backward
./replicating_start_suppl_n P 4000 -updok -autorollback
#./replicating_start_suppl_n P 4000 -updok 

The shutdown sequence is as follows:

source ./gtmenv A V6.3-000A_x86_64
./replicating_stop
source ./gtmenv P V6.3-000A_x86_64
./replicating_stop
source ./gtmenv B V6.3-000A_x86_64
./originating_stop

Switchover possibilities in a B←A→P→Q replication configuration

Consider a situation where A and P are located in one data center, with BC replication to B and Q respectively, located in another data center. When the first data center fails, the SI replication from A to P is replaced by SI replication from B to Q. The following scenario describes a switchover from B←A→P→Q to A←B→Q→P with unreplicated updates on A and P.

A

B

P

Q

Comments

O: ... A95, A96, A97, A98, A99

R: ... A95, A96, A97, A98

S: ... P34, A95, P35, P36, A96, P37, A97, P38

R: ... P34, A95, P35, P36, A96, P37

A as an originating primary instance at transaction number A99, replicates to B as a BC replicating secondary instance at transaction number A98 and P as a SI that includes transaction number A97, interspersed with locally generated updates. P in turn replicates to Q.

Goes down with the data center

O: ... A95, A96, A97, A98, B61, B62

Goes down with the data center

... P34, A95, P35, P36, A96, P37

When a data center outage disables A, and P, B becomes the new originating primary, with A98 as the latest transaction in its database and starts processing application logic to maintain business continuity. Q can receive the SI replication stream from B, without requiring a rollback since the receiver is not ahead of the source.

-

O: ... A95, A96, A97, A98, B61, B62

-

S: ... P34, A95, P35, P36, A96, P37, A97, A98, Q73, B61, Q74, B62

Q receives SI replication from B and also applies its own locally generated updates. Although A97 and A98 were originally generated on A, Q receives them from B. Q also computes and applies locally generated updates

... A95, A96, A97, A98, A99

O: ... A95, A96, A97, A98, B61, B62, B63, B64

... P34, A95, P35, P36, A96, P37, A97,A98, P38

S: ... P34, A95, P35, P36, A96, P37, A97, A98, Q73, B61, Q74, B62, Q75, B63, Q76, B64

While B and Q, keep the enterprise in operation, the first data center is recovered. Since A has transactions in its database that were not replicated to B when the latter started operating as the originating primary instance, and since P had transactions that were not replicated to Q when the latter took over, A and P must now rollback their databases and create Unreplicated Transaction Files before receiving BC replication streams from B and Q respectively. A rolls off A99, P rolls off P38.

R: ... A95, A96, A97, B61, B62, B63, B64

O: ... A95, A96, A97, B61, B62, B63, B64, B65

R: ... P34, A95, P35, P36, A96, P37, A97, A98, Q73, B61, Q74, B62, Q75, B63, Q76, B64

S: ... P34, A95, P35, P36, A96, P37, A97, A98, Q73, B61, Q74, B62, Q75, B63, Q76, B64, Q77

Having rolled off their transactions into Unreplicated Transaction Logs, A can now operate as a BC replicating instance to B and P can operate as the SI replicating instance to Q. B and Q continue operating as originating primary instance and supplementary instance. P automatically receives M38 after applying the Unreplicated Transaction Log (from P) to Q. A and P automatically receive A99 after applying the Unreplicated Transaction Log (from A) to B.

The following example runs this scenario.

source ./gtmenv A V6.3-000A_x86_64
./db_create
./repl_setup
./originating_start A P 4000
./originating_start A B 4001
source ./gtmenv B V6.3-000A_x86_64
./db_create
./repl_setup
./replicating_start B 4001
source ./gtmenv A V6.3-000A_x86_64
./backup_repl startA 
source ./gtmenv P V6.3-000A_x86_64
./db_create
./suppl_setup P startA 4000 -updok
./backup_repl startP
./originating_start P Q 4005
source ./gtmenv Q V6.3-000A_x86_64
./db_create
./suppl_setup Q startP 4005 -updnotok
source ./gtmenv A V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'for i=1:1:96 set ^A(i)=i'
source ./gtmenv P V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'for i=1:1:37 set ^P(i)=i'
source ./gtmenv Q V6.3-000A_x86_64
./replicating_stop
source ./gtmenv P V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^P(38)=1000'
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64 
$gtm_dist/mumps -r ^%XCMD 'set ^A(97)=1000,^A(98)=1000'
source ./gtmenv B V6.3-000A_x86_64
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64 
$gtm_dist/mumps -r ^%XCMD 'set ^A(99)=1000'
./originating_stop 
source ./gtmenv B V6.3-000A_x86_64
backup_repl startB
./originating_start B Q 4008
$gtm_dist/mumps -r ^%XCMD 'for i=1:1:62 set ^B(i)=i'
source ./gtmenv Q V6.3-000A_x86_64
./rollback 4008 backward
./suppl_setup Q startB 4008 -updok
$gtm_dist/mumps -r ^%XCMD 'for i=1:1:74 set ^Q(i)=i'
source ./gtmenv B V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'for i=63:1:64 set ^B(i)=i'
./originating_start B A 4004
source ./gtmenv A V6.3-000A_x86_64
./rollback 4004 backward
./replicating_start A 4004
source ./gtmenv Q V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'for i=75:1:76 set ^Q(i)=i'
./originating_start Q P 4007
./backup_repl startQ
source ./gtmenv P V6.3-000A_x86_64
./rollback 4007 backward
./replicating_start_suppl_n P 4007 -updnotok
source ./gtmenv Q V6.3-000A_x86_64
$gtm_dist/mumps -r ^%XCMD 'set ^Q(77)=1000'
cat A/gtm.lost
cat P/gtm.lost

The shutdown sequence is as follows:

source ./gtmenv P V6.3-000A_x86_64
./replicating_stop
source ./gtmenv A V6.3-000A_x86_64
./replicating_stop
source ./gtmenv Q V6.3-000A_x86_64
./replicating_stop
./originating_stop
source ./gtmenv B V6.3-000A_x86_64
./originating_stop

Changing the global directory in an A→B replication configuration

In a replication configuration, a global directory provides information to map global updates to their respective database files. As replication processes pick the state of the global directory at process startup, any change made to the global directory requires process restarts (at a minimum) to bring that change into effect. A switchover mechanism can ensure application availability while making global directory changes.

On B:

  1. Shut down the Receiver Server and the Update Process.

  2. Make a copy of the current global directory.

  3. If the globals you are moving have triggers, make a copy of their definitions with MUPIP TRIGGER -SELECT and delete them with MUPIP TRIGGER.

  4. Update the global directory.

  5. If you are rearranging the global name spaces which do not contain any data, skip to step 9.

  6. Create a backup copy of B, turn off replication, and cut the previous links of the journal file.

  7. Use the MERGE command to copy a global from the prior to the new location. Use extended references (to the prior global directory) to refer to global in the prior location.

  8. If the globals you are moving have triggers, apply the definitions saved in step 3.

  9. Turn replication on for the region of the new global location.

  10. Make B the new originating instance. For more information, refer to “Switchover possibilities in an A→B replication configuration”.

On A:

  1. Shutdown replication.

  2. If the globals you are moving have triggers, make a copy of their definitions with MUPIP TRIGGER -SELECT and delete them with MUPIP TRIGGER; note if the triggers are the same as those on B, which they normally would be for a BC instance you can just delete them and use the definitions saved on B.

  3. Update the global directory.

  4. If you are rearranging the global name spaces which do not contain any data, skip to step 7.

  5. Create a backup copy of A, turn off replication, and cut the previous links of the journal file.

  6. Use the MERGE command to copy a global from the prior to the new location. Use extended references (to the prior global directory) to refer to global in the prior location.

  7. If the globals you are moving have triggers, apply the definitions saved in step 1.

  8. Turn replication on for the region of the new global location.

  9. Make A the new replicating instance.

Perform a switchover to return to the A->B configuration. Once normal operation resumes, remove the global from the prior location (using extended references) to release space.

If a switchover mechanism is not in place and a downtime during the global directory update is acceptable, follow these steps:

On B:

  • Perform steps 1 to 9.

  • Restart the Receiver Server and the Update Process.

On A:

  • Bring down the application (or prevent new updates from getting started).

  • Perform Steps 1 to 8.

  • Restart the originating instance.

  • Restart the active Source Server.

  • Bring up the application.

This example adds the mapping for global ^A to a new database file A.dat in an A->B replication configuration.

source ./gtmenv A V6.3-000A_x86_64 
./db_create
./repl_setup
./originating_start A B 4001
source ./gtmenv B V6.3-000A_x86_64
./db_create
./repl_setup
./replicating_start B 4001
source ./gtmenv A V6.3-000A_x86_64 
$gtm_dist/mumps -r %XCMD 'for i=1:1:10 set ^A(i)=i'
./repl_status
source ./gtmenv B V6.3-000A_x86_64
./replicating_stop
cp B/gtm.gld B/prior.gld
$gtm_dist/mumps -r ^GDE @updgld
./db_create
mkdir backup_B
$gtm_dist/mupip backup "*" backup_B  -replinst=backup_B/gtm.repl
$gtm_dist/mupip set -journal=on,before_images,filename=B/gtm.mjl -noprevjnlfile -region "DEFAULT"
$gtm_dist/mumps -r %XCMD 'merge ^A=^|"B/prior.gld"|A'
$gtm_dist/mupip set -replication=on -region AREG
./originating_start B A 4001
source ./gtmenv A V6.3-000A_x86_64 
./originating_stop
./rollback 4001 backward
cat A/gtm.lost  #apply lost transaction file on A. 
./replicating_start A 4001
./replicating_stop 
cp A/gtm.gld A/prior.gld
$gtm_dist/mumps -r ^GDE @updgld
./db_create
mkdir backup_A
$gtm_dist/mupip backup "*" backup_A -replinst=backup_A/gtm.repl
$gtm_dist/mupip set -journal=on,before_images,filename=A/gtm.mjl -noprevjnlfile -region "DEFAULT"
$gtm_dist/mumps -r %XCMD 'merge ^A=^|"A/prior.gld"|A'
$gtm_dist/mupip set -replication=on -region AREG
./replicating_start A 4001
./repl_status
#Perform a switchover to return to the A->B configuration. Remove the global in the prior location to release space with a command like Kill ^A=^|"A/prior.gld"|A'. 

The shutdown sequence is as follows:

source ./gtmenv A V6.3-000A_x86_64
./replicating_stop
source ./gtmenv B V6.3-000A_x86_64
./originating_stop

Rolling Software Upgrade

A rolling software upgrade is the procedure of upgrading an instance in a way that there is minimal impact on the application uptime. An upgrade may consist of changing the underlying database schema, region(s), global directory, database version, application version, triggers, and so on. There are two approaches for a rolling upgrade. The first approach is to upgrade the replicating instance and then upgrade the originating instance. The second approach is to upgrade the originating instance first while its replicating (standby) instance acts as an originating instance.

The following two procedures demonstrate these rolling software upgrade approaches for upgrading an A→B replication configuration running an application using GT.M V6.1-000_x86_64 to GT.M V6.2-001_x86_64 with minimal (a few seconds) impact on the application downtime.

Upgrade the replicating instance first (A→B)

On B:

  1. Shut down the passive Source and Receiver Servers and the application.

  2. Turn off replication.

  3. Perform a MUPIP RUNDOWN operation and make a backup.

  4. Open DSE, run DUMP -FILEHEADER for each region (FIND -REGION=<Region_Name>) and note down the largest Region Seqno.

  5. Upgrade the instance. An upgrade may include include adding triggers, adding/removing regions, changing GDE mapping, and so on.

  6. Open DSE again, run DSE DUMP -FILEHEADER for each region (FIND -REGION=<Region_Name>) and note down the largest Region Seqno. If the largest Region Seqno noted in step 4 and largest Region Seqno noted in this step are the same, proceed to step 7. Otherwise, execute DSE CHANGE -FILEHEADER - REG_SEQNO=<Largest_Region_Seqno_from_step_4> for the region having the largest Region Seqno.

  7. Cut the back links to the prior generation journal files with a command like:

    $gtm_dist/mupip set -journal=on,before_images,filename=B/gtm.mjl -noprevjnlfile -region "DEFAULT"
  8. Turn on replication.

  9. If the use of replication filters apply to your situation, bring up the replicating instance with the new-to-old filter on the Source Server of A, and the old-to-new filter on the Receiver Server of B. Otherwise, bring up the replicating instance on B.

  10. Wait for B to automatically catup up the pending updates from A.

on A:

  1. When there are no/low updates on A, shut down the Source Server.

  2. Turn off replication.

  3. Perform a MUPIP RUNDOWN and make a backup copy of the database.

  4. Perform a switchover to make B the originating instance. Apply lost/broken transactions, if any, on B.

  5. Open DSE, run DUMP -FILEHEADER for each region (FIND -REGION=<Region_Name>) and note down the largest Region Seqno.

  6. Upgrade the instance. An upgrade may include include adding triggers, adding/removing regions, changing GDE mapping, and so on.

  7. Open DSE again, run DSE DUMP -FILEHEADER for each region (FIND -REGION=<Region_Name>) and note down the largest Region Seqno. If the largest Region Seqno noted in step 5 and largest Region Seqno noted in this step are the same, proceed to step 8. Otherwise, execute DSE CHANGE -FILEHEADER -REG_SEQNO=<Largest_Region_Seqno_from_step_5> for the region having the largest Region Seqno.

  8. Cut the back links to the prior generation journal files with a command like:

    $gtm_dist/mupip set -journal=on,before_images,filename=A/gtm.mjl -noprevjnlfile -region DEFAULT
  9. Turn on replication.

  10. Start the Receiver Server of A.

Upgrade the originating instance first (A→B)

on A:

  1. When there are no updates on A and both A and B are in sync, shut down the Source Server.

  2. Turn off replication.

  3. Perform a MUPIP RUNDOWN and make a backup copy of the database.

  4. Perform a switchover to make B the originating instance. This ensure application availability during the upgrade of A.

  5. Open DSE, run DUMP -FILEHEADER for each region (FIND -REGION=<Region_Name>) and note down the largest Region Seqno.

  6. Upgrade the instance. An upgrade may include include adding triggers, adding/removing regions, changing GDE mapping, upgrading the GT.M version, and so on.

  7. Open DSE again, run DSE DUMP -FILEHEADER for each region (FIND -REGION=<Region_Name>) and note down the largest Region Seqno. If the largest Region Seqno noted in step 5 and largest Region Seqno noted in this step are the same, proceed to step 8. Otherwise, execute DSE CHANGE -FILEHEADER -REG_SEQNO=<Largest_Region_Seqno_from_step_5> for the region having the largest Region Seqno.

  8. Cut the back links to the prior generation journal files with a command like:

    $gtm_dist/mupip set -journal=on,before_images,filename=A/gtm.mjl -noprevjnlfile -region DEFAULT
  9. Turn on replication.

  10. If the use of replication filters apply to your situation, bring up the Receiver Server with the old-to-new filter. Otherwise bring up the Receiver Server.

  11. Wait for A to automatically catch up the pending updates from B.

on B:

  1. When there are no/low updates on A, shut down the Source Server.

  2. Turn off replication.

  3. Perform a MUPIP RUNDOWN and make a backup copy of the database.

  4. Perform a switchover to reinstate A as the originating instance. This ensures application availability during the upgrade of B.

  5. Open DSE, run DUMP -FILEHEADER for each region (FIND -REGION=<Region_Name>) and note down the largest Region Seqno.

  6. Upgrade the instance. An upgrade may include include adding triggers, adding/removing regions, changing GDE mapping, and so on.

  7. Open DSE again, run DSE DUMP -FILEHEADER for each region (FIND -REGION=<Region_Name>) and note down the largest Region Seqno. If the largest Region Seqno noted in step 5 and largest Region Seqno noted in this step are the same, proceed to step 8. Otherwise, execute DSE CHANGE -FILEHEADER -REG_SEQNO=<Largest_Region_Seqno_from_step_5> for the region having the largest Region Seqno.

  8. Cut the back links to the prior generation journal files with a command like:

    $gtm_dist/mupip set -journal=on,before_images,filename=B/gtm.mjl -noprevjnlfile -region DEFAULT
  9. Turn on replication.

  10. Start the Receiver Server of B.

  11. The upgrade of A and B is complete.

[Important] Note on Triggers

While adding triggers, bear in mind that triggers get replicated if you add them when replication is turned on. However, when you add triggers when replication is turned off, those triggers and the database updates resulting from the executing their trigger code do not get replicated.

Here is an example to upgrade A and B deployed in an A→B replication configuration from V6.1-000_x86_64 to V6.2-001_x86_84. This example uses instructions from the “Upgrade the originating instance first (A→B)” procedure.

source ./env A V6.1-000_x86_64
./db_create
./repl_setup
./originating_start A B 4001
source ./env B V6.1-000_x86_64
./db_create
./repl_setup
./replicating_start B 4001
source ./env A V6.1-000_x86_64
$gtm_dist/mumps -r %XCMD 'for i=1:1:100 set ^A(i)=i'
./status
source ./env B V6.1-000_x86_64
./replicating_stop
source ./env A V6.1-000_x86_64
./status
./originating_stop 
$gtm_dist/mupip set -replication=off -region "DEFAULT"
$gtm_dist/dse dump -f 2>&1| grep "Region Seqno"
#Perform a switchover to make B the originating instance. 
source ./env A V6.2-001_x86_64
$gtm_dist/mumps -r ^GDE exit
$gtm_dist/mupip set -journal=on,before_images,filename=A/gtm.mjl -noprevjnlfile -region "DEFAULT"
#Perform the upgrade 
$gtm_dist/dse dump -fileheader 2>&1| grep "Region Seqno"
#If Region Seqno is greater than the Region Seqno noted previously, run $gtm_dist/dse change -fileheader -req_seqno=<previously_noted_region_seqno>.
./repl_setup
#A is now upgraded to V6.2-001_x86_64 and is ready to resume the role of the originating instance. Shutdown B and reinstate A as the originating instance. 
./originating_start A B 4001
source ./env B V6.2-001_x86_64
$gtm_dist/mumps -r ^GDE exit
$gtm_dist/mupip set -journal=on,before_images,filename=B/gtm.mjl -noprevjnlfile -region "DEFAULT"
#Perform the upgrade 
$gtm_dist/dse dump -fileheader 2>&1| grep "Region Seqno"
#If Region Seqno is different, run $gtm_dist/dse change -fileheader -req_seqno=<previously_noted_region_seqno>.
$gtm_dist/dse dump -f 2>&1| grep "Region Seqno"
#If Region Seqno is greater than the Region Seqno noted previously, run $gtm_dist/dse change -fileheader -req_seqno=<previously_noted_region_seqno>.
./repl_setup
./replicating_start B 4001

The shutdown sequence is as follows:

source ./env B V6.2-001_x86_64
./replicating_stop
source ./env A V6.2-001_x86_64
./originating_stop

Shutting down an instance

To shutdown an originating instance:

  • Shut down all GT.M and mupip processes that might be attached to the Journal Pool.

  • In case the originating instance is also a supplementary instance, shutdown the Receiver Server(s) (there might be more than one Receiver Server in future GT.M versions).

  • Shut down all active and/or passive Source Servers.

  • Execute mupip rundown -region to ensure that the database, Journal Pool, and Receiver Pool shared memory is rundown properly.

To shutdown a propagating instance:

  • Shut down all replicating instance servers (Receiver Server, Update Process and its Helper processes).

  • Shutdown the originating instance servers (all active and/or passive Source Servers).

  • On its replicating instances, ensure that there are no GT.M or MUPIP processes attached to the Journal Pool as updates are disabled (they are enabled only on the originating instance).

  • Execute mupip rundown -region to ensure that the database, Journal Pool, and Receiver Pool shared memory is rundown properly.

Creating a new Replication Instance File

You do not need to create a new replication instance file except when you upgrade from a GT.M version prior to V5.5-000. Unless stated in the release notes of your GT.M version, you instance file does not need to be upgraded. If you are creating a new replication instance file for any administration purpose, remember that doing do so will remove history records which may prevent it from resuming replication with other instances. To create a new replication instance file, follow these steps:

  • Shut down all mumps, MUPIP and DSE processes except Source and Receiver Server processes; then shut down the Receiver Server (and with it, the Update Process) and all Source Server processes. Use MUPIP RUNDOWN to confirm that all database files of the instance are closed and there are no processes accessing them.

  • Create a new replication instance file (you need to provide the instance name and instance file name, either with command line options or in environment variables, as described in other examples of this section):

    • If this is instance is to receive SI replication or to receive BC replication from an instance that receives SI replication, use the command:

       mupip replicate -instance_create -supplementary
    • Otherwise use the command:

      mupip replicate -instance_create
    • If a replication instance file already exists, these commands will create a backup copy of the replicating instance and then create a new replication instance file. If you want to prevent accidental overwriting your existing replication instance file, use the -noreplace qualifier with these commands.

  • Prepare it to accept a replication stream:

    • Start a passive Source Server. If this is an SI replicating instance, use the -updok flag to start the passive Source Server.

    • Start the Receiver Server using the updateresync. For versions prior to V5.5-000 use the -updateresync qualifier and for GT.M versions V5.5-000 or newer use -updateresync=<repl_inst>.For example, mupip replicate -receiver -start -updateresync=<repl_inst> where repl_inst is the prior replication file if the source is V5.5-000 or newer and -updateresync if it is an older GT.M release.

  • Start a Source Server on a root or propagating primary instance to replicate to this instance. Verify that updates on the source instance are successfully replicated to the receiver instance.

The -updateresync qualifier indicates that instead of negotiating a mutually agreed common starting point for synchronization the operator is guaranteeing the receiving instance has a valid state that matches the source instance currently or as some point in the past. Generally this means the receiving instance has just been updated with a backup copy from the source instance.

On instances with the same endian-ness, follow these steps to create a replication instance file without using the -updateresync qualifier.

On the source side:

  • Use the MUPIP BACKUP command with the -REPLINSTANCE qualifier to backup the instance to be copied. The source server for the instance must be started at least once before backing up the replication instance file.

  • Ship the backed up databases and instance file to the receiving side.

On the receiving side:

  • Run the MUPIP REPLIC -EDITINST command on the backed up instance file to change the instance name to reflect the target instance. This makes the source replication instance file usable on the target instance while preserving the history records in the instance file.

  • Create new journal files, start a passive Source Server and a Receiver Server (without an -updateresync qualifier).

  • Allow a Source Server to connect.

[Note]

When the instances have different endian-ness, create a new replication instance file as described in Creating the Replication Instance File

Setting up a secured TLS replication connection

The following example creates two instances (Alice and Bob) and a basic framework required for setting up a TLS replication connection between them. Alice and Bob are fictional characters from https://en.wikipedia.org/wiki/Alice_and_Bob and represent two instances who use the certificates signed by the same demo root CA. This example is solely for the purpose of explaining the general steps required to encrypt replication data in motion. You must understand, and appropriately adjust, the scripts before using them in a production environment. Note that all certificates created in this example are for the sake of explaining their roles in a TLS replication environment. For practical applications, use certificates signed by a CA whose authority matches your use of TLS.

  1. Remove the comment tags from the following lines in the gtmenv script:

    export gtmcrypt_config=$PWD/$gtm_repl_instname/config_file
    echo -n "Enter Password for gtmtls_passwd_${gtm_repl_instname}: ";export gtmtls_passwd_${gtm_repl_instname}="`$gtm_dist/plugin/gtmcrypt/maskpass|tail -n 1|cut -f 3 -d " "`"
  2. Execute the gtmenv script as follows:

    $ source ./gtmenv Alice V6.2-001_x86_64

    This creates a GT.M environment for replication instance name Alice. When prompted, enter a password for gtmtls_passwd_Alice.

  3. ./db_create

    This creates the global directory and the database for instance Alice.

  4. Create a demo root CA, leaf-level certificate, and a $gtmcrypt_config file with a tlsid called Alice for instance Alice. Note that in this example, $gtmcrypt_config is set to $PWD/Alice/config_file. For more information on creating the $gtmcrypt_config file and the demo certificates required to run this example, refer to Appendix G: “Creating a $gtmcrypt_config file.

    Your $gtmcrypt_config file should look something like:

    tls: {
        verify-depth: 7;
         CAfile: "/path/to/certs/ca.crt";
         Alice: {
               format: "PEM";
               cert: "/path/to/certs/Alice.crt";
               key: "/path/to/certs/Alice.key";
         };
      };
  5. Turn replication on and create the replication instance file:

    $ ./repl_setup
  6. Start the originating instance Alice:

    $ ./originating_start Alice Bob 4001 -tlsid=Alice -reneg=2

On instance Bob:

  1. Execute the gtmenv script as follows:

    $ source ./gtmenv Bob V6.2-001_x86_64

    This creates a GT.M environment for replication instance name Bob. When prompted, enter a password for gtmtls_passwd_Bob.

  2. $ ./db_create

    This creates the global directory and the database for instance Bob.

  3. Create a leaf-level certificate and a $gtmcrypt_config file with a tlsid called Bob for instance Bob. Note that in this example, $gtmcrypt_config is set to $PWD/Bob/config_file. Note that you would use the demo CA that you created before to sign this leaf-level certificate. For replication to proceed, both leaf-level certificates must be signed by the same root CA. For more information, refer to Appendix G: “Creating a $gtmcrypt_config file.

    Your $gtmcrypt_config file should look something like:

    tls: {
        verify-depth: 7;
         CAfile: "/path/to/certs/ca.crt";
         Bob: {
               format: "PEM";
               cert: "/path/to/certs/Bob.crt";
               key: "/path/to/certs/Bob.key";
         };
      };
  4. Turn replication on and create the replication instance file:

    $ ./repl_setup
  5. Start the replicating instance Bob.

    $ ./replicating_start Bob 4001 -tlsid=Bob

For subsequent environment setup, use the following commands:

source ./gtmenv Bob V6.2-001_x86_64 or source ./gtmenv Alice V6.2-001_x86_64
./replicating_start Bob 4001 -tlsid=Bob or ./originating_start Alice Bob 4001 -tlsid=Alice -reneg=2

Schema Change Filters

Filters between the originating and replicating systems perform rolling upgrades that involve database schema changes. The filters manipulate the data under the different schemas when the software revision levels on the systems differ.

GT.M provides the ability to invoke a filter; however, an application developer must write the filters specifically as part of each application release upgrade when schema changes are involved.

Filters should reside on the upgraded system and use logical database updates to update the schema before applying those updates to the database. The filters must invoke the replication Source Server (new schema to old) or the database replication Receiver Server (old schema to new), depending on the system's status as either originatig or replicating. For more information on Filters, refer to “Filters”.

Recovering from the replication WAS_ON state

If you notice the replication WAS_ON state, correct the cause that made GT.M turn journaling off and then execute MUPIP SET -REPLICATION=ON.

To make storage space available, first consider moving unwanted non-journaled and temporary data. Then consider moving the journal files that predate the last backup. Moving the currently linked journal files is a very last resort because it disrupts the back links and a rollback or recover will not be able to get back past this discontinuity unless you are able to return them to their original location.

If the replication WAS_ON state occurs on the originating side:

If the Source Server does not reference any missing journal files, -REPLICATION=ON resumes replication with no downtime.

If the Source Server requires any missing journal file, it produces a REPLBRKNTRANS or NOPREVLINK error and shuts down. Note that you cannot rollback after journaling turned off because there is insufficient information to do such a rollback.

In this case, proceed as follows:

  1. Take a backup (with MUPIP BACKUP -BKUPDBJNL=OFF -REPLINST=<bckup_inst>) of the originating instance.

  2. Because journaling was turned off in the replication WAS_ON state, the originating instance cannot be rolled back to a state prior to the start of the backup. Therefore, cut the previous generation link of the journal files on the originating instance and turn replication back on. Both these operations can be accomplished with MUPIP SET -REPLICATION="ON".

  3. Restore the replicating instance from the backup of the originating instance. Change the instance name of <bckup_inst> to the name of the replicating instance (with MUPIP REPLIC -EDITINST -NAME).

  4. Turn on replication and journaling in the restored replicating instance. Specify the journal file pathname explicitly with MUPIP SET -JOURNAL=filename=<repinst_jnl_location> (as the backup database has the originating instance's journal file pathname).

  5. Restart the Source Server process on the originating instance.

  6. Start the Receiver Server (with no -UPDATERESYNC) the replicating instance.

If the replication WAS_ON state occurs on the receiving side:

Execute MUPIP SET -REPLICATION=ON to return to the replication ON state. This resumes normal replication on the receiver side. As an additional safety check, extract the journal records of updates that occurred during the replication WAS_ON state on the originating instance and randomly check whether those updates are present in the receiving instance.

If replication does not resume properly (due to errors in the Receiver Server or Update Process), proceed as follows:

  1. Take a backup (with MUPIP BACKUP -BKUPDBJNL=OFF -REPLINST=<bckup_inst>) of the originating instance.

  2. Restore the replicating instance from the backup of the originating instance. Change the instance name of <bckup_inst> to the name of the replicating instance (with MUPIP REPLIC -EDITINST -NAME).

  3. Turn on replication and journaling in the restored replicating instance. Specify the journal file pathname explicitly with MUPIP SET -JOURNAL=filename=<repinst_jnl_location> (as the backup database has the originating instance's journal file pathname).

  4. Restart the Source Server process on the originating instance. Normally, the Source Server might still be running on the originating side.

  5. Start the Receiver Server (with no -UPDATERESYNC) the replicating instance.

Rollback data from crashed (idle) regions

When a rollback operations fails with CHNGTPRSLVTM, NOPREVLINK, and JNLFILEOPENERR messages, evaluate whether you have a crashed region in your global directory that is seldom used for making updates (idle). The updates in an idle region's current generation journal file may have timestamps and sequence numbers that no longer exist in the prior generation journal file chains of more frequently updated regions because of periodic pruning of existing journal files as part of routine maintenance. MUPIP SET and BACKUP commands can also remove previous generation journal file links.

Terminating a process accessing an idle region abnormally (say with kill -9 or some other catastrophic event) may leave its journal files improperly closed. In such an case, the discrepancy may go unnoticed until the next database update or rollback. Performing a rollback including such an idle region may then resolve the unified rollback starting time, (reported with a CHNGTPRSLVTM message), to a point in time that does not exist in the journal file chain for the other regions, thus causing the rollback to fail.

In this rare but possible condition, first perform a rollback selectively for the idle region(s). Here are the steps:

  1. Create a temporary global directory which maps only the idle region(s) by mapping one, often the only, such idle region's file to the default region.

  2. Set the environment variable gtmgbldir to point to the temporary global directory.

  3. Perform an optimal rollback (MUPIP JOURNAL -ROLLBACK -BACKWARD "*")

  4. Analyze and process any broken or lost transaction files that the rollback procedure may generate.

  5. Set the environment variable gtmgbldir to point back to the location of the global directory for your application.

  6. Perform a normal rollback for your application.

You do not need to perform these steps if you have a non-replicated but journaled database because RECOVER operations do not coordinate across regions.

As a general practice, perform an optimal recovery/rollback every time when starting a GT.M application from quiescence and, depending on the circumstances, after a GT.M process terminates abnormally.

FIS recommends rotating journal files with MUPIP SET when removing old journal files or ensuring that all regions are periodically updated.

Setting up a new replicating instance of an originating instance (A→B, P→Q, or A→P)

To set up a new replicating instance of an originating instance for the first time or to replace a replicating instance if database and instance file get deleted, you need to create the replicating instance from a backup of the originating instance, or one of its replicating instances.

If you are running GT.M V5.5-000 or higher:

  • Take a backup of the database and the replication instance file of the originating instance together at the same time with BACKUP -REPLINSTANCE and transfer them to the location of the replicating instance. If the originator's replicating instance file was newly created, take its backup while the Source Server is running to ensure that the backup contains at least one history record.

  • Use MUPIP REPLICATE -EDITINST -NAME=<secondary-instname> to change the replicating instance's name.

  • Start the replicating instance without -udpateresync.

If you are running GT.M pre-V5.5-000:

  • Create a new replication instance file on the replicating instance.

  • Start the replicating instance with this new replication instance file with the -updateresync qualifier (no value specified).

Replacing the replication instance file of a replicating instance (A→B and P→Q)

In this case, it is possible that the replicating instance's database files are older than the originating instance. Note that to resume replication there is no need to transfer the backup of the originating instance's database and replication instance files.

To replace the existing replicating instance file with a new replication instance file, follow these steps:

If you are running GT.M V5.5-000 or higher:

  • Take a backup of just the replication instance file (no database files with BACKUP -REPLINST=</path/to/bkup-orig-repl-inst-file>) and transfer it to the site of the replicating instance.

  • Start the replicating instance with -updateresync=</path/to/bkup-orig-repl-inst-file>.

In this case, the Receiver Server determines the current instance's journal sequence number by taking a maximum of the Region Sequence Numbers in the database file headers on the replicating instance and uses use the input instance file to locate the history record corresponding to this journal sequence number, and exchanges this history information with the Source Server.

If you are running GT.M pre-V5.5-000:

  • Create a new replication instance file on the replicating instance.

  • Start the replicating instance with this new instance file with the -updateresync qualifier (no value specified).

Replacing the replication instance file of a replicating instance (A→P)

On P:

  • Use the -SUPPLEMENTARY qualifier with the MUPIP REPLICATE -INSTANCE_CREATE command to indicate this is a supplementary instance file.

  • Start a Source Server on P with -UPDOK to indicate local updates are enabled on P.

  • Start the Receiver Server on P with the -UPDATERESYNC=</path/to/bkup-orig-repl-inst-file> qualifier and -RESUME. -RESUME indicates that A and P had been replicating before. The Receiver Server looks at the local (stream #0) sequence numbers in the database file headers on P and takes the maximum value to determine the journal sequence number of the new stream on P. It then uses this as the instance journal sequence number on A to resume replication.

Setting up a new replicating instance from a backup of the originating instance (A→P)

On A:

  • Take a backup of the replication instance file and the database together at the same time with BACKUP -REPLINSTANCE and transfer it to P. If the A's replication instance file was also freshly created, then take the backup while the Source Server is running on the originating instance. This ensures that the backed up replication instance file contains at least one history record.

On P:

  • Create a new replication instance file. Use the -SUPPLEMENTARY qualifier with the MUPIP REPLICATE -INSTANCE_CREATE command to indicate this is a supplementary instance file.

  • Restore the database backups from A to P or use MUPIP LOAD to load the data.

  • Start a Source Server on P with -UPDOK to indicate local updates are enabled on P.

  • Start the Receiver Server on P with the -UPDATERESYNC=</path/to/bkup-orig-repl-inst-file> qualifier and -INITIALIZE. The -INITIALIZE indicates this is the first time A and P are replicating.

In this case, the Receiver Server uses the current journal sequence number in the </path/to/bkup-orig-repl-inst-file> as the point where A starts sending journal records. GT.M updates the stream sequence number of Stream # 1 in the instance file on P to reflect this value. From this point, GT.M maps the journal sequence number on A to a stream journal sequence number (for example, stream # 1) on P.

Setting up an A→P configuration for the first time if P is an existing instance (having its own set of updates)

On P:

  • Turn off passive Source Server and Receiver Server (if they are active).

  • Turn off replication and run down the database.

On A:

  • Take a backup of the replication instance file and the database together with BACKUP -REPLINSTANCE and transfer it to P. If A's instance file was also freshly created, take the backup while the Source Server is running on the originating instance. This ensures that the backed up replication instance file contains at least one history record.

On P:

  • Do not create a new instance file. Continue using the existing instance file to preserve updates that have already occurred on P.

  • Start the Source Server with -UPDOK to indicate that local updates are enabled on P.

  • Start the Receiver Server with the -UPDATERESYNC=</path/to/bkup-orig-repl-inst-file> qualifier and -INITIALIZE. The -INITIALIZE indicates this is the first time A and P are replicating.

The Receiver Server uses the current journal sequence number in the </path/to/bkup-orig-repl-inst-file> as the point where A starts sending journal records. GT.M updates the stream sequence number (for example, of Stream # 1) in the instance file on P to reflect this value. Going forward, the journal sequence number on A will always map to a stream journal sequence number (for example, of stream # 1) on P.