SlickEdit Product Discussion > SlickEdit®

Beware ":b" in regular expressions - There be dragons...

(1/2) > >>

Matthew:
About once a week our support group receives an email from a customer having problems with a regular expression hanging their system or not working like they had predicted. In almost every case the problem boils down to a mis-understanding of how and when to use :b (whitespace).

For users accustomed to using Perl and .NET regex syntax, the SlickEdit :b can be misleading. The :b construct is actually defined as [ \t]# (SE syntax) or [ \t]+ (Unix). It already has a "maximal match" built into it. This can cause a slow, grinding search if immediately paired with a * or +. Eg: :b* or :b+. In Perl or .NET syntax, if you want to search for something at the beginning of the line after some leading whitespace (a very frequent scenario when parsing source code), you'd typically write something like
--- Code: ---^\s*(public|private|protected)
--- End code ---

But you shouldn't place a * or + after a SlickEdit :b. At best, it may not work like you intended; at worst, it will bog down the search. There are two ways to handle this. First, if the whitespace is not optional, just use the :b as-is, with no qualifiers.
Second, if the whitepace is optional, use the character class construct directly, as below:
--- Code: ---^[ \t]*(public|private|protected)
--- End code ---
You would also want to use the character class approach if you needed to perform a minimal match of whitespace.

We may add a :s construct in a future version. But for now, take care when using :b.

jbezem:
You just reminded me: I use (SE regex syntax) '(:b|)' for optional whitespace.
I used to have problems, though, with the equivalent '(|:b)' (in both cases without the quotes).
Any specific reason that jumps to mind? I didn't try that lately, definitely never with version 11.x, but I remember quite well, since it's cost me some hours if not days to figure out...

Regards,

Johan

Matthew:
I haven't dug through the version history of the op-code processing, but there were some changes made between 10.0.2 and 11.0.0 to try to prevent some of the "match-the-whole-world" problems. So the case could be that the sequence of op-codes that (:b|) and or (|:b) was generating has been changed to prevent hanging expressions.

srouleau:
Rather than adding :s, why not add support for PCRE (Perl Compatible Regular Expressions)?  It's a bit of a pain switching between Perl/grep/sed, MS's findstr, and SE.

Just wishful thinking :)

eblot:

--- Quote from: srouleau on August 12, 2006, 12:44:31 am ---Rather than adding :s, why not add support for PCRE (Perl Compatible Regular Expressions)?  It's a bit of a pain switching between Perl/grep/sed, MS's findstr, and SE.
Just wishful thinking :)

--- End quote ---

I've been asking SE team for PCRE since release V8 or V9 ...
I guess they have not received enough requests from customers to add it (?)

Navigation

[0] Message Index

[#] Next page

Go to full version