Author Topic: Beware ":b" in regular expressions - There be dragons...  (Read 11258 times)

Matthew

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 990
  • Hero Points: 44
Beware ":b" in regular expressions - There be dragons...
« on: July 28, 2006, 03:37:25 PM »
About once a week our support group receives an email from a customer having problems with a regular expression hanging their system or not working like they had predicted. In almost every case the problem boils down to a mis-understanding of how and when to use :b (whitespace).

For users accustomed to using Perl and .NET regex syntax, the SlickEdit :b can be misleading. The :b construct is actually defined as [ \t]# (SE syntax) or [ \t]+ (Unix). It already has a "maximal match" built into it. This can cause a slow, grinding search if immediately paired with a * or +. Eg: :b* or :b+. In Perl or .NET syntax, if you want to search for something at the beginning of the line after some leading whitespace (a very frequent scenario when parsing source code), you'd typically write something like
Code: [Select]
^\s*(public|private|protected)
But you shouldn't place a * or + after a SlickEdit :b. At best, it may not work like you intended; at worst, it will bog down the search. There are two ways to handle this. First, if the whitespace is not optional, just use the :b as-is, with no qualifiers.
Second, if the whitepace is optional, use the character class construct directly, as below:
Code: [Select]
^[ \t]*(public|private|protected)You would also want to use the character class approach if you needed to perform a minimal match of whitespace.

We may add a :s construct in a future version. But for now, take care when using :b.

jbezem

  • Community Member
  • Posts: 87
  • Hero Points: 8
Re: Beware ":b" in regular expressions - There be dragons...
« Reply #1 on: July 31, 2006, 06:46:59 AM »
You just reminded me: I use (SE regex syntax) '(:b|)' for optional whitespace.
I used to have problems, though, with the equivalent '(|:b)' (in both cases without the quotes).
Any specific reason that jumps to mind? I didn't try that lately, definitely never with version 11.x, but I remember quite well, since it's cost me some hours if not days to figure out...

Regards,

Johan

Matthew

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 990
  • Hero Points: 44
Re: Beware ":b" in regular expressions - There be dragons...
« Reply #2 on: July 31, 2006, 07:03:49 PM »
I haven't dug through the version history of the op-code processing, but there were some changes made between 10.0.2 and 11.0.0 to try to prevent some of the "match-the-whole-world" problems. So the case could be that the sequence of op-codes that (:b|) and or (|:b) was generating has been changed to prevent hanging expressions.

srouleau

  • Community Member
  • Posts: 68
  • Hero Points: 4
Re: Beware ":b" in regular expressions - There be dragons...
« Reply #3 on: August 12, 2006, 12:44:31 AM »
Rather than adding :s, why not add support for PCRE (Perl Compatible Regular Expressions)?  It's a bit of a pain switching between Perl/grep/sed, MS's findstr, and SE.

Just wishful thinking :)

eblot

  • Junior Community Member
  • Posts: 4
  • Hero Points: 0
Re: Beware ":b" in regular expressions - There be dragons...
« Reply #4 on: August 12, 2006, 11:25:06 AM »
Rather than adding :s, why not add support for PCRE (Perl Compatible Regular Expressions)?  It's a bit of a pain switching between Perl/grep/sed, MS's findstr, and SE.
Just wishful thinking :)

I've been asking SE team for PCRE since release V8 or V9 ...
I guess they have not received enough requests from customers to add it (?)

hs2

  • Senior Community Member
  • Posts: 2761
  • Hero Points: 292
Re: Beware ":b" in regular expressions - There be dragons...
« Reply #5 on: August 12, 2006, 12:40:49 PM »
Besides the impact to the Slick-codebase it's difficult to decide which else reg-exp 'standards' to support.
E.g. POSIX is a real standard and widely accepted too.
Since a while I moved to Slick's reg-exp.s b/c it's unmatched powerful (when using Slick of course).

But I agree - would be nice if the major (mostly used) reg-exp.s are supported.

HS2
« Last Edit: August 14, 2006, 09:02:58 AM by hs2 »

srouleau

  • Community Member
  • Posts: 68
  • Hero Points: 4
Re: Beware ":b" in regular expressions - There be dragons...
« Reply #6 on: August 14, 2006, 01:53:23 PM »
Well, I've never seen the Slick codebase, but you'd think adding a new reg-exp would be pretty localised. 

Then again, as with a lot of code that has some history attached to it, the code may be organically growing in strange, and bold, directions making such an addition difficult and unlikely.

Steph.