SlickEdit Community

Archived Beta Discussions => SlickEdit 201x Beta Discussions => SlickEdit 2016 v21 Beta Discussion => Topic started by: CyberZombie on September 14, 2016, 08:24:33 PM

Title: Incorrect replace using Perl regex
Post by: CyberZombie on September 14, 2016, 08:24:33 PM
When I attempt to clear trailing white space, I use a global replace using the expression "\s+$". In B2, B3 it goes one better - it also wipes out empty lines.

Also, I expect to only need to use "\n" to signify end-of-line. But for Windows 0x0d0x0a terminators, I have to use "\n\r" - new to me from V15
Title: Re: Incorrect replace using Perl regex
Post by: Clark on September 14, 2016, 08:39:14 PM
v15 and v16 are the same with respect to these subtle difference. Both have the new much more accurate Perl syntax support. There are a number of Perl regex things I'm not a fan of but SlickEdit's Perl regex support needs to be very accurate.

SlickEdit regex doesn't have these issues but it is very different from Perl syntax though.
Title: Re: Incorrect replace using Perl regex
Post by: jporkkahtc on September 14, 2016, 08:52:21 PM
WRT line endings, use \R - it matches any line ending.

In the Perl RE spec it has some odd things - the one that bugs me the most:
    [^abc]
Matches anything but abc, including newlines.
So generally when using this you want to explicitly exclude newlines as well:
   [^abc\r\n]


This is a useful RE resource: http://www.regular-expressions.info/refcharacters.html
It would be nice if they included Slick RE information too  :)
Title: Re: Incorrect replace using Perl regex
Post by: CyberZombie on September 14, 2016, 09:13:38 PM
Yes - wrt \n\r I was relying on old incorrect behavior :) Thanks jporkkahtc for the \R pointer.

It still doesn't explain \s+$ removing empty lines as \n and \r aren't whitespace characters...
Title: Re: Incorrect replace using Perl regex
Post by: CyberZombie on September 14, 2016, 09:22:40 PM
Follow-up - from the perldoc, there is an exception (http://perldoc.perl.org/perlrecharclass.html):
Quote
If the /a modifier is in effect ...

In all Perl versions, \s matches the 5 characters [\t\n\f\r ]; that is, the horizontal tab, the newline, the form feed, the carriage return, and the space. Starting in Perl v5.18, it also matches the vertical tab, \cK . See note [1] below for a discussion of this.
And from the regex modifiers (http://perldoc.perl.org/perlre.html#Character-set-modifiers):
Quote
The /a modifier, on the other hand, may be useful. Its purpose is to allow code that is to work mostly on ASCII data to not have to concern itself with Unicode.
Does this mean that I should be doing all my text editing using Unicode to get \s+ to not match newline?