$ZPATNumeric
$ZPATNumeric
$ZPATN[UMERIC] is a read-only intrinsic special variable that determines how GT.M interprets the patcode “N” used in the pattern match operator.
With $ZPATNUMERIC=“UTF-8”, the patcode “N” matches any numeric character as defined by UTF-8 encoding. With $ZPATNUMERIC=“M”, GT.M restricts the patcode “N” to match only ASCII digits 0-9 (that is, ASCII 48-57). When a process starts in UTF-8 mode, intrinsic special variable $ZPATNUMERIC takes its value from the environment variable gtm_patnumeric. GT.M initializes the intrinsic special variable $ZPATNUMERIC to “UTF-8” if the environment variable gtm_patnumeric is defined to “UTF-8”. If the environment variable gtm_patnumeric is not defined or set to a value other than “UTF-8”, GT.M initializes $ZPATNUMERIC to “M”.
GT.M populates $ZPATNUMERIC at process initialization from the environment variable gtm_patnumeric and does not allow the process to change the value.
For UTF-8 characters, GT.M assigns patcodes based on the default classification of the Unicode® character set by the ICU library with three adjustments:
- If $ZPATNUMERIC is not “UTF-8”, non-ASCII decimal digits are classified as A.
- Non-decimal numerics (Nl and No) are classified as A.
- The remaining characters (those not classified by ICU functions: u_isalpha, u_isdigit, u_ispunct, u_iscntrl, 1), or 2) above) are classified into either patcode P or C. The ICU function u_isprint is used since is returns “TRUE” for non-control characters.
The following table contains the resulting general category as per the Unicode standard to M patcode mapping:
| General category as per the Unicode® standard | GT.M patcode class |
|---|---|
| L* (all letters) | A |
| M* (all marks) | P |
| Nd (decimal numbers) | N (if decimal digit is ASCII or $ZPATNUMERIC is “UTF-8”, otherwise A |
| Nl (letter numbers) | A (examples of Nl are Roman numerals) |
| No (other numbers) | A (examples of No are fractions) |
| P* (all punctuation) | P |
| S* (all symbols) | P |
| Zs (spaces) | P |
| Zl (line separators) | C |
| Zp (paragraph separators) | C |
| C* (all control code points) | C |
For a description of the Unicode general categories, refer to http://unicode.org/charts/.
Example:
GTM>write $zpatnumeric
UTF-8
GTM>Write $Char($$FUNC^%HD("D67"))?.N ; This is the Malayalam decimal digit 1
1
GTM>Write 1+$Char($$FUNC^%HD("D67"))
1
GTM>Write 1+$Char($$FUNC^%HD("31")) ; This is the ASCII digit 1
2