Author Topic: Regex nitpick  (Read 1061 times)

jporkkahtc

  • Senior Community Member
  • Posts: 1424
  • Hero Points: 111
  • Text
Regex nitpick
« on: April 17, 2017, 10:42:13 pm »
\o:v (?:[A-Za-z_$][A-Za-z0-9_$]*) - Matches a C variable.

But --- C/C++ variables do not allow "$"!

Also, it would be nice to allow user-defined subexpressions like this.
Somehow I could define
   \o:z = (?:[A-Za-z_][A-Za-z0-9_]*)
 
or possibly, redefine existing ones ...
    \o:v = (?:[A-Za-z_][A-Za-z0-9_]*)

Dennis

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 2244
  • Hero Points: 282
Re: Regex nitpick
« Reply #1 on: April 18, 2017, 01:56:01 pm »
The C++ Standard allows for additional implementation-defined characters to be allowed in identifiers.  Because it was historically used in identifiers, dollar sign is allowed by most compilers.  Currently, I believe the Clang warns, GCC has an option to warn about it, and Visual Studio just plain allows $ in identifiers.

So, really, this is a question of the letter of the standard vs. the practical reality.

But to really get to the point, in what way are you using "$" in your C++ code such that having it in the identifier regex causes incorrect matches?

jporkkahtc

  • Senior Community Member
  • Posts: 1424
  • Hero Points: 111
  • Text
Re: Regex nitpick
« Reply #2 on: April 18, 2017, 05:25:28 pm »
Well, I wasn't using it for C++.
I was doing a reg-ex F&R on a set of re expressions - good grief (translating from slick re to perl).
I didn't realize that $ was such a common extension, its pretty rare that I've seen $ used in a variable name.

Perhaps, the feature I'd like more is for "\o:v" to be language sensitive :-)

Searching thru the Slick program files, seems like these regex patterns are built into many of the *.exe and *.dll files. Making it a use setting and getting all of these binaries to know about the setting might be tricky.

Dennis

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 2244
  • Hero Points: 282
Re: Regex nitpick
« Reply #3 on: April 18, 2017, 09:03:25 pm »
That's a reasonable suggestion, however, for backwards compatibility, it would make more sense for \o:v to remain C identifiers, and add a new shortcut like \o:l for a language-specific identifier.

The syntax you propose for defining subexpressions would be ambiguous, it would be better to add support for an existing syntax, such as Perl regular expression subroutines.  Would need to have Clark chime in on this one, because it can be tricky to implement, especially if you support recursive regex subroutines.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 4016
  • Hero Points: 265
Re: Regex nitpick
« Reply #4 on: April 18, 2017, 10:17:34 pm »
Note that the word characters defined for the language are used for the word boundary stuff (i.e word matching, \b, \B, \o<, \o>, etc.). Doesn't effect \o:v though.

I'd like to add a way for user definable regular expressions. I haven't thought about the possible syntax much. It needs to start with \o which is how we add other extensions to Perl syntax. \o[name] or \o{name} could work. I don't think its a good idea to redefine the existing one letter stuff. For user definable one letter stuff we could come up with another syntax like \o!letter (some non-letter character, doesn't have to be !).

jporkkahtc

  • Senior Community Member
  • Posts: 1424
  • Hero Points: 111
  • Text
Re: Regex nitpick
« Reply #5 on: April 18, 2017, 11:38:59 pm »
I was thinking of adding an enhancement to gui_find to allow for env-var substitutions in general.
This could work in the reg-ex expression as well - the regex engine would never know about this.
I'd have to be careful about the syntax to ensure I don't interfere with normal reg-ex expressions.


So for FileTypes, I could type
    <!MyAlias>
    <$MyVar>
    <%rp>
which would get substituted much like "<Current Buffer Directory>" is substituted.
I don't know exactly what syntax I'd use, but something like:
    <!x> -- Alias "x"
    <$x> -- Env var "x"
    <%x> -- Slick substitutions that work some places for things like project path.


Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 4016
  • Hero Points: 265
Re: Regex nitpick
« Reply #6 on: April 19, 2017, 12:14:45 am »
Interesting idea. I don't like using environment variables except when I have to. Using aliases is ok. An alias profile specific for file types would be better. The downside is that the GUI and alias command would have to be enhanced. Maybe too much work for you.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 4016
  • Hero Points: 265
Re: Regex nitpick
« Reply #7 on: April 19, 2017, 09:54:44 pm »
Perl regular expressions already have a way to define regular expressions and then use them later.

(?(DEFINE)(?<foo>x)(?<bar>y)(?&foo)(?&foo)(?&bar)  -- Matches xxy

The DEFINE stuff does not generate any regex code. I just defines two named expressions that can be used later.

The (?&foo) syntax specifies the text "x" and its associated flags be compiled at the referenced location. This is not a back reference.

The current SlickEdit Perl regular expressions don't have these constructs but I've added them for v22.

One possibility is we could extend the use of (?&name) to find external user defined named regular expressions if an internal named match group isn't found with the same name. Seems reasonable to me. In any case, I'm rolling around this idea.

jporkkahtc

  • Senior Community Member
  • Posts: 1424
  • Hero Points: 111
  • Text
Re: Regex nitpick
« Reply #8 on: April 19, 2017, 10:46:07 pm »
(?(DEFINE...
Is there anything *not* in Perl RE?

I was thinking about this feature primarily for LookIn, FileType, and especially Exclude.
Doing in SearchFor would be just for completeness.

I tend to have a very long "Exclude" field, filled with things like *.dll, *.exe, .git\, etc.
What I'd like to do is be able to make Exclude be
    <!bin_file_types>;<!git_ignores>;<!vstudio>
I have aliases for these now, but to use them in Exclude, I have to expand them.
It would be much easier if I could use them without expanding them.


For "bin_file_types", I just created an alias with a list of them.
Slick could create such a filter based on the file types it knows are binary -- from Tools > Options > Languages > Miscellaneous Languages > Binary
(Annoyingly, "zip" is a binary, so excluding all binaries might prevent "look in zip files").

So, adding this to my previous list:
    <!MyAlias>
    <$MyVar>
    <%rp>
    <@Binary>
    <@C/C++>

The "@" expansion would generate the pattern for all the associated file types.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 4016
  • Hero Points: 265
Re: Regex nitpick
« Reply #9 on: April 19, 2017, 10:51:21 pm »
Replacements for file types and excludes would definitely be useful. You've got some good ideas here.

I'd still like user definable regex patterns.

jporkkahtc

  • Senior Community Member
  • Posts: 1424
  • Hero Points: 111
  • Text
Re: Regex nitpick
« Reply #10 on: April 19, 2017, 11:06:55 pm »
The perl syntax is interesting, but it is not something I'd be likely to use in Slick -- its just way too much to type.
I have made use of the "$(<1>x|<2>y|z)" replace syntax - but actually typing in these kinds of patterns is too hard for me .. I generate them with my translateTerms() macro.


Saved reg-ex patterns with the (?&foo) syntax you mentioned, could be useful though.

Hopefully user-defined regex patterns could be integrated in the the regex eval tool && saved search expressions in some way.

jporkkahtc

  • Senior Community Member
  • Posts: 1424
  • Hero Points: 111
  • Text
Re: Regex nitpick
« Reply #11 on: April 24, 2017, 06:08:44 pm »
I found "_unix_expansion": If "def_unix_expansion" is 1, then "$var" expansions are done on the LookIn and Exclude fields. Unfortunately, I also found that if I did "set var=value" in the build window, these vars aren't picked up elsewhere in Slick (with getenv).

OK, but its a start.
So I hacked up a custom version of "_unix_expansion" -- see attached -- with @: "@exclude"
Now, in "excludes" I can use aliases without expanding them

Unfortunately, I also had to patch "alias.e", in "find_alias2" because some properties are apparently not defined for text edit controls.

Code: [Select]
            #ifdef jporkka
            p_UTF8 = (orig_view_id.p_object==OI_EDITOR)?orig_view_id.p_UTF8:true;
            p_encoding = (orig_view_id.p_object==OI_EDITOR)?orig_view_id.p_encoding:0;
            p_newline = (orig_view_id.p_object==OI_EDITOR)?orig_view_id.p_newline:"\n";
            p_indent_with_tabs = (orig_view_id.p_object==OI_EDITOR)?orig_view_id.p_indent_with_tabs:false;
            #else
            p_UTF8 = orig_view_id.p_UTF8;
            p_encoding = orig_view_id.p_encoding;
            p_newline = orig_view_id.p_newline;
            p_indent_with_tabs = orig_view_id.p_indent_with_tabs;
            #endif