Author Topic: Regular Expression question  (Read 3468 times)

jimlangrunner

  • Senior Community Member
  • Posts: 360
  • Hero Points: 31
  • Jim Lang - always a student.
Regular Expression question
« on: April 17, 2008, 01:20:11 PM »
I'm processing a big file and need to "fix up" a list of tab-separated fields.  The data looks something like this:
Code: [Select]
KS FD 3 1 1 4 A 8 F 7 0
KS FD 3 1 1 4 A 9 F 9 0
KS FD 3 1 1 4 A 10 F 10 0
KS FD 3 1 1 4 A 1 M 4 0
KS FD 3 1 1 4 A 2 M 4 0

The Reg ex I'm using looks like this:
Code: [Select]
^KS\tFD\t3\t{+:d|11|12}\t{+:d}\t{+:d}\t{A|C}\t{+:d|10}\t{F|M}\t{+:d|1+:d}\t{+:d}$Note that, in the 3rd line, I have to use {+:d|10} to get it to find the "10", otherwise it skips over the line with the 10 in it.  Simlarly, I have other fields that have multi-digit values that are skipped unless I "or" them in.  The expression brackets exist because I need to turn this into a comma-separated list.

I expect the expression "+:d" to mean "0 or more" (+) and digit (:d) to get me anything from "0" to "000111222333".  Am I missing something?

I've got the work done for now, but would like to figure this out so that the next time I need to do this kind of thing I won't pull what little hair I have left out getting it to catch the special cases.

I'm using the Slickedit regular expression in SE 2008 (13.0.0.0).  I'm using the Visual C++ 6 emulation.  Any suggestion would be appreciated.

Thanks,
Jim

EDIT - Hold on.  I may have found what I'm looking for in help - the Regex Evaluator will likely help.  I'll repost if that doesn't fix em up.  Thanks.
« Last Edit: April 17, 2008, 01:29:40 PM by jimlangrunner »

Lee

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 1299
  • Hero Points: 130
Re: Regular Expression question
« Reply #1 on: April 17, 2008, 01:37:26 PM »
The quantifier token needs to come after the token you want to match, you want ":d+" which translates to "minimally match character set [0-9] one or more times".   Putting the token first means nothing in that situation, and will only match a single character in set [0-9].  If you want zero or more occurrences, the you would use :d* (at least in SlickEdit syntax).

You should check out the help section on Regular Expressions, the documentation is gone through a great deal of improvement in recent versions and may help out.

jimlangrunner

  • Senior Community Member
  • Posts: 360
  • Hero Points: 31
  • Jim Lang - always a student.
Re: Regular Expression question
« Reply #2 on: April 17, 2008, 01:37:47 PM »
No, I don't feel like too much of an idiot, but a little. 

Code: [Select]
{+:d} will pick up 1, but not 10
Code: [Select]
{:d+} will pick up 1 as well as 10

The key is that the "0 or more" (*) and "1 or more" (+) modifiers are post-fix, not prefix. 

So I learned something today, and I thank you for taking the time to read this. 

Jim.

jimlangrunner

  • Senior Community Member
  • Posts: 360
  • Hero Points: 31
  • Jim Lang - always a student.
Re: Regular Expression question
« Reply #3 on: April 17, 2008, 01:39:13 PM »
...
You should check out the help section on Regular Expressions, the documentation is gone through a great deal of improvement in recent versions and may help out.
Lee,

You're right.  I should read more before posting. 

Thanks,
Jim.