Author Topic: Regular expression not matching a string  (Read 5912 times)

pitord

  • Junior Community Member
  • Posts: 3
  • Hero Points: 0
Regular expression not matching a string
« on: January 19, 2011, 11:24:55 PM »
Hello,

How does one using regular expression (preferably UNIX) search for the lines that do not contain the string "Albany", and ultimately for all the lines that contain Pittsburgh and do not contain Albany.

Montreal   Toronto    Albany  Cleveland
Detroit    New York   Albany  Cleveland
Pittsburgh Washington Albany  Akron
Pittsburgh Washington Albany  Akron
Pittsburgh Washington Albany  Akron
Pittsburgh Los Angeles Hanover Johnstown
Pittsburgh New York   Albany  Sacramento
Pittsburgh Seattle    Albany  Sacramento
Pittsburgh Washington Chicago Sacramento

Thank you in advance,

Peter

MartyL

  • Senior Community Member
  • Posts: 166
  • Hero Points: 29
  • Synergex
Re: Regular expression not matching a string
« Reply #1 on: January 20, 2011, 12:10:54 AM »
This ought to do it for you.

Code: [Select]
(?:Pittsburgh)+?(?!.*Albany)
SlickEdit's Regex Evaluator is extremely helpful in building your expression. Check out Tools > Regex Evaluator

The little dropdown to the right of the expression box has a shortcut to SlickEdit's help section on regular expressions. They break down the different types for you, so Unix format is cake even if you weren't previously familiar with it.

pitord

  • Junior Community Member
  • Posts: 3
  • Hero Points: 0
Re: Regular expression not matching a string
« Reply #2 on: January 20, 2011, 03:57:46 PM »
MartyL,

What do you mean by this ought to do it for you?  It works like a dream, this is quite a thing.  Did you actually construct it using the little dropdown to the right of the expression box... no way.
I would like to ask a couple questions about the syntax if you don't mind.  I tried to disassemble your code and attach SlickEdit's Help explanations:
(?:Pittsburgh) => (?:X) => Matches subexpression X but does not define a tagged expression
+?                =>  X+?  => Minimal match of one or more occurrences of X.
(?!.*Albany)   => (?!X) => Search fails if expression X is matched. The expression ^(?!if) matches the beginning of all lines that do not start with if.
I think I might be right thus far, the question is what is the .*, I was thinking that * (as a wild card would do it, but no way, it messes the whole thing up beyond recognition).
I look in the same SlickEdit Help and I see:
s.*?t  =>  Matches the letter s followed by any number of characters followed by the nearest letter t. Two possible matches are seat and st
That looks good, now what is the ? doing here.
I appreciate your help, great stuff,

Pedro
 

MartyL

  • Senior Community Member
  • Posts: 166
  • Hero Points: 29
  • Synergex
Re: Regular expression not matching a string
« Reply #3 on: January 20, 2011, 05:22:33 PM »
Hi Pedro,

I did not use the dropdown to build it. I used SlickEdit's very helpful help to put it together.

Your breakdown looks correct.

?:  => could be omitted if you need to use the tagged expressions. If you are only using the regex to check for a match and don't necessarily care about using the regex to help breakdown your string, then the ?: should perform slightly faster.

+?  => is checking for one or more of the (?:Pittsburgh)'s. If you want to find one and only one, use (?:Pittsburgh){1}

?!  => is a fail condition, correct. That's the key to making sure that Albany is not in the string.

.*  => is what absorbs the rest of the string between Pittsburgh and your failure condition, Albany. The period is a wildcard character and the asterisk is a maximal match of zero or more occurrences. If you used this wildcard outside of your failure condition (?!), then it would consume the Albany as well and you would never find your failure case.

In the example (s.*?t), the period is a wildcard and *? is a minimal match of zero or more characters. The minimal match means that it will favor matching a "t" character to the "t" in your regex, rather than to the wildcard.

Hope this helps,
Marty

pitord

  • Junior Community Member
  • Posts: 3
  • Hero Points: 0
Re: Regular expression not matching a string
« Reply #4 on: January 20, 2011, 07:59:38 PM »
Marty,

And thanks to your explanations I even understand how it works.  This is great stuff.
Really appreciate your help.

Pedro