$ZPATN[UMERIC] is a read-only intrinsic special variable that determines how GT.M interprets the patcode "N" used in the pattern match operator.
With $ZPATNUMERIC="UTF-8", the patcode "N" matches any numeric character as defined by UTF-8 encoding. With $ZPATNUMERIC="M", GT.M restricts the patcode "N" to match only ASCII digits 0-9 (that is, ASCII 48-57). When a process starts in UTF-8 mode, intrinsic special variable $ZPATNUMERIC takes its value from the environment variable gtm_patnumeric. GT.M initializes the intrinsic special variable $ZPATNUMERIC to "UTF-8" if the environment variable gtm_patnumeric is defined to "UTF-8". If the environment variable gtm_patnumeric is not defined or set to a value other than "UTF-8", GT.M initializes $ZPATNUMERIC to "M".
GT.M populates $ZPATNUMERIC at process initialization from the environment variable gtm_patnumeric and does not allow the process to change the value.
Warning | |
---|---|
GT.M performs operations on literals at compile time and the pattern codes settings may have an impact on such operations. Therefore, always compile with the same pattern code settings as those used at runtime. |
For UTF-8 characters, GT.M assigns patcodes based on the default classification of the UnicodeA(R) character set by the ICU library with three adjustments:
If $ZPATNUMERIC is not "UTF-8", non-ASCII decimal digits are classified as A.
Non-decimal numerics (Nl and No) are classified as A.
The remaining characters (those not classified by ICU functions: u_isalpha, u_isdigit, u_ispunct, u_iscntrl, 1), or 2) above) are classified into either patcode P or C. The ICU function u_isprint is used since is returns "TRUE" for non-control characters.
The following table contains the resulting general category as per the Unicode standard to M patcode mapping:
General category as per the UnicodeA(R) standard |
GT.M patcode class |
---|---|
L* (all letters) |
A |
M* (all marks) |
P |
Nd (decimal numbers) |
N (if decimal digit is ASCII or $ZPATNUMERIC is "UTF-8", otherwise A |
Nl (letter numbers) |
A (examples of Nl are Roman numerals) |
No (other numbers) |
A (examples of No are fractions) |
P* (all punctuation) |
P |
S* (all symbols) |
P |
Zs (spaces) |
P |
Zl (line separators) |
C |
Zp (paragraph separators) |
C |
C* (all control code points) |
C |
For a description of the Unicode general categories, refer to http://unicode.org/charts/.
Example:
GTM>write $zpatnumeric UTF-8 GTM>Write $Char($$FUNC^%HD("D67"))?.N ; This is the Malayalam decimal digit 1 1 GTM>Write 1+$Char($$FUNC^%HD("D67")) 1 GTM>Write 1+$Char($$FUNC^%HD("31")) ; This is the ASCII digit 1 2