MUTEXLCKALERT, Mutual Exclusion subsystem ALERT - Lock attempt threshold crossed for region rrrr. Process pppp is in crit cycle cccc.
Run Time Error: This warning indicates that a process could not obtain a critical section lock for region rrrr even after waiting longer than the GT.M determined threshold (approximately 32 seconds) because the critical section lock was held that entire time by another process pppp. cccc is the crit cycle count which GT.M increases by one every time it successfully grants the mutual exclusion (mutex) lock to a process. cccc provides a measure of the frequency of mutex lock use. MUTEXLCKALERT messages indicate that process pppp is blocking access to region rrrr for inappropriately long periods of time and thereby impacting performance for other processes needing access that region.
GT.M produces this warning when:
A process owning a critical section dies (most likely because of a kill -9) and the OS gives its PID to another process. To reclaim the inappropriately held critical section, GT.M first checks whether the process is alive and whether it holds hold the critical section. On finding that the process is alive but does not hold the critical section, GT.M concludes that it is not safe to free the critical section and alerts the operator with this message.
The process holding the critical section is using a non-Isolated command such as ZSYSTEM, BREAK or a timed command in a way that creates a deadlock or a live-lock. GT.M attempts to limit this by limiting the time a process using one of these commands can hold a critical section, but your use of non-Isolated commands and your settings for $ZMAXTPTIM and / or the environment variable $gtm_tpnotacidtime may be such that you get MUTEXLCKALERT messages.
There is an IO bottleneck that caused GT.M to slow down: GT.M detects that process pppp is currently using the critical section lock.
Note | |
---|---|
GT.M blocks signals during MUTEXLCKALERT warnings. This means that GT.M defers error handling as a result of TP timeout ($ZMAXTPTIME), interrupt handler invocation, $ztimeout action, MUPIP STOP, etc. until the mutex is released. For example,a process may have a 10 seconds $ZMAXTIMEOUT but GT.M may execute the error handler at a materially later time, until after the MUTEXLCKALERT condition has cleared. |
Action: Monitor the system to determine whether there is a process with process id pppp and whether that process is a GT.M process.
Implement a script to get a stack trace for process pppp or take other appropriate action and use the $gtm_procstuckexec environment variable to activate it before the block process sends the MUTEXLCKALERT message.
Identify and terminate process pppp to release control of that resource. If the process is a GT.M process, use a MUPIP STOP to terminate it. If a process of another application, use an appropriate mechanism to stop it.
If this message is due to an IO bottleneck, adopt a strategy that reduces IO. Some of the IO reducing strategies are:
Revisit your database configuration parameters (especially block size, number of global buffers, journal buffers, and so on) to see if you can make improvements.
Create separate region (database) for temporary globals and do not replicate them.
Consider whether a different database access method and journaling strategy could improve throughput while satisfying your operational needs.
Consider tuning your filesystem
For application configurations with large numbers of concurrent processes and/or large process memory footprints, consider placing object code in shared libraries on GT.M editions that support it. This may free system memory which the OS can use for its file system cache, or which you can use to increase the number of global buffers.
Do not apply IO reduction strategies all at once. Try them one at a time and always verify/measure the results of each strategy. |