Author Topic: Perl RegEx: \w matches $  (Read 804 times)

jporkkahtc

  • Senior Community Member
  • Posts: 2620
  • Hero Points: 210
  • Text
Perl RegEx: \w matches $
« on: September 02, 2022, 11:10:28 PM »
\w should not match $.

According to https://perldoc.perl.org/perlre
It should match [a-zA-Z0-9_]
Slick matches [a-zA-Z0-9_$.]

jporkkahtc

  • Senior Community Member
  • Posts: 2620
  • Hero Points: 210
  • Text
Re: Perl RegEx: \w matches $
« Reply #1 on: September 02, 2022, 11:15:06 PM »
Oh interesting ... The RegEx tool uses WordChars defined on Tools > Options > Languages > Miscellaneous Languages > Plain Text > General.
I'm no so sure that "\w" should be language sensitive -- seems like a Slickedit Extension.

Also, Slickedit help says
Code: [Select]
Equivalent to [a-zA-Z0-9_]. Can also be used inside a character class.
Also not mentioned in: Compatibility Issues With Perl Regular Expressions

jporkkahtc

  • Senior Community Member
  • Posts: 2620
  • Hero Points: 210
  • Text
Re: Perl RegEx: \w matches $
« Reply #2 on: September 02, 2022, 11:34:13 PM »
Looking a little more ...
[[:word:]] correctly matches [a-zA-Z0-9_]


WRT Perl/Posix character classes:

[[:ascii:]] Doesn't work in slickedit



[[:space:]] and [[:punct:]] are not documented in slick, but seems to work

According to https://perldoc.perl.org/perlrecharclass#Negation-of-POSIX-character-classes,
Negation: [[:^digit:]] -- Does not work in Slickedit.
However, this does: [^[:digit:]]

I don't know why they would add the extra syntax. I suppose it makes it usable in a non-negated [] set.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: Perl RegEx: \w matches $
« Reply #3 on: September 03, 2022, 01:32:29 AM »
It looks like \w and \W are incorrect.  SlickEdit's intent for Perl regex is for everything to work as close to Perl as possible. Something that is a new syntax can do whatever. This does currently use your word characters settings BUT the on-line help DOES NOT indicate this for Perl regex or SlickEdit regex. Fortunately, there are no SlickEdit macros which use \w or \W in a SlickEdit regex. We will change \w and \W for Perl and SlickEdit regex.

I'll look into the [[...]] issues you've found. There's no docs on these because there are way too many. We just try to define all the same characters classes as Perl.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: Perl RegEx: \w matches $
« Reply #4 on: September 03, 2022, 01:57:26 AM »
This is worse than I thought. Not only do \w and \W use the language specific word characters, \b and \B also use the language specific word characters. IMO \b and \B are much more valuable when the language specific word characters are used. Usually programmers are working with variable names and not hard wired word characters. If SlickEdit had some other extension for these, it wouldn't be so bad to just remove these. Now I'm thinking it makes more sense to just deviate from Perl here and change the docs. I'll have to think about this some more.

jporkkahtc

  • Senior Community Member
  • Posts: 2620
  • Hero Points: 210
  • Text
Re: Perl RegEx: \w matches $
« Reply #5 on: September 03, 2022, 04:15:03 AM »
Oh, right... \b is pretty useful as is.
Maybe an option with either (or both) syntax:  (?^OptionLetters:X), or a \o?
to select \w\W\b\B word character compatibility mode?

I hit it today because I was using Slickedit to test/validate a regex I was using in powershell -- slickedit was matching stuff that it should not.


Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: Perl RegEx: \w matches $
« Reply #6 on: September 03, 2022, 03:27:12 PM »
I'm still thinking about what to do. I've noticed that there are >19 uses of \b mostly in SlickEdit regular expressions. Most are in proc search functions or language specific searching. NONE override the word characters.

For example,
Code: [Select]
   int status = search('[.;,(){}]|<<|>>|->|\b(case|catch|end|fun|if|receive|try)\b', "-rh@XSC");

Technically the above has a potential bug and should be this:
Code: [Select]
   int status = search('[.;,(){}]|<<|>>|->|\b(case|catch|end|fun|if|receive|try)\b', "-rh@XSC,w=[a-zA-Z0-9_]");
Otherwise, if the user decides to change their word characters, the above code could break.

I found one Perl case which uses the pos() function and doesn't override the word characters (it's an extra argument).

ALL of these places I found need some sort of fix.

Here are the solutions I'm thinking about:

Solution A:

   * Change the documentation
   * Fix the macro uses of \b to specify the word characters

   CONS
       * More deviations from Perl regex which are not necessary

Solution B:
   * Change \w, \W, \b, \B to use hardwired Perl word characters for both Perl and SlickEdit regex.
   * Have \o<, \o> use the user defined word characters for both Perl and slickEdit syntax.
   * Need to add \ob to match a user defined word boundary and \oB to match a user defined non-word boundary.
   * Need to add configurable wordchars options. Some needed for more precise Vim regex support.

         \ow<[char-set]    Match beginning of word
         \ow>[char-set]    Match end of word
         \owb[char-set]    Match at word boundary
         \owB[char-set]    Match at non-word boundary
   
   * \w, \W should use hardwired Perl word characters inside character set definition
   * \ow, \oW should use user defined word characters inside character set definition.
   * Change the documentation
   * Change the macro uses of \b to use \o<, \o>, and specify the word characters

   CONS
       * Could break user macros/expectations

Solution C:
   * Change \w, \W, \b, \B to use hardwired Perl word characters BUT do not make this change for SlickEdit regex
   * Have \o<, \o> use the user defined word characters for both Perl and slickEdit syntax.
   * Need to add \ob to match a user defined word boundary and \oB to match a user defined non-word boundary.
   * Need to add configurable wordchars options. Some needed for more precise Vim regex support.

         \ow<[char-set]    Match beginning of word
         \ow>[char-set]    Match end of word
         \owb[char-set]    Match at word boundary
         \owB[char-set]    Match at non-word boundary
   
   * \w, \W should use hardwired Perl word characters inside Perl character set definition
   * \ow, \oW should use user defined word characters inside character set definition.
   * Change the documentation
   * Fix the macro uses of \b to specify the word characters

   CONS
       * Could break user macros/expectations but change is limited to Perl syntax which is used less in macros.
« Last Edit: September 03, 2022, 05:24:38 PM by Clark »

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: Perl RegEx: \w matches $
« Reply #7 on: September 04, 2022, 01:21:09 AM »
To summarize:

Solution A allows Perl deviations for \b, \B, \w, \W
Solution B fixes the Perl deviations, adds word capabilities that are exactly the same for Perl and SlickEdit
Solution C fixes Perl deviations, adds some word capabilities but existing SlickEdit word escapes will work the same. New similar syntax Perl word escape docs won’t match SlickEdit.

I am leaning toward solution B because I prefer when escapes and docs match for Perl and SlickEdit.


Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: Perl RegEx: \w matches $
« Reply #8 on: September 04, 2022, 09:55:20 AM »
Went with solution B. It turns out that the hardwired Perl word characters for \b worked for almost all the macros that it was used in. Also, there was no need to add \ob and \oB which uses configured word characters. I've left them out for now.
« Last Edit: September 04, 2022, 10:28:48 AM by Clark »