SlickEdit Community

SlickEdit Product Discussion => SlickEdit® => Topic started by: jporkkahtc on October 07, 2017, 02:09:02 PM

Title: Custom language, R - identifiers
Post by: jporkkahtc on October 07, 2017, 02:09:02 PM
Can I use a reg-ex for identifiers?

In R identifiers:
Start with A-Za-z, followed by A-Za-z0-9_.
Can also start with a ., but not followed by a digit.
A perl regex might look like: ((\.[A-Za-z_])|([A-Za-z))[A-Za-z0-9_.]*

But, a letter is defined by isalnum(c) -- locale dependent, so accented letters and such are allowed.
Annoying, but not so terrible. You could simple make the regex include all these letters and ignore locale.
(This only applies to letters. The only digits allowed are 0...9).

HORRORS ...
R also lets you quote identifiers (who created this stupid language??)
But, there are special rules that apply to decide if it is a name or a string.
backticks are preferred. For example (pay attention to ' vs ` )
> 'a bad name' = 5
> 'a bad name'
[1] "a bad name"
> 'a bad name'+1
Error in "a bad name" + 1 : non-numeric argument to binary operator
> `a bad name`
[1] 5
> `a bad name` + 1
[1] 6


R Studio colors backticks as identifiers, but not the assignment I show above.
Title: Re: Custom language, R - identifiers
Post by: Clark on October 08, 2017, 01:26:15 PM
It's probably best to define the identifier start chars as A-Za-z_ and the follow chars as 0-9 (you can add regex character sets for a group of unicode characters if you need to).

Here's what we use for the Scala start and follow chars.
Code: [Select]
<idchars v="$_\p{Ll}\p{Lu} \p{Lt}\p{Lo}\p{Nl}0-9"/>

Then add a regex which specifically matches identifiers starting with a "." (choose "identifier" as the color). Also add a regex to match single quoted identifiers.

This should do the trick. You may need to use lookahead or lookbehind to help out. In Scala and some other language we support, use of single quote can have more than one meaning. We use lookahead or lookbehind to screen out certain matches.

FYI: I promise you Scala is way worse than R. We got a bunch of built-in very complicated regular expressions defined to handle it (not in color coding definition because they would have been too confusing).