Author Topic: Regular Expression question (Read 3468 times)

jimlangrunner · « **on:** April 17, 2008, 01:20:11 PM »

I'm processing a big file and need to "fix up" a list of tab-separated fields. The data looks something like this:

KS	FD	3	1	1	4	A	8	F	7	0
KS	FD	3	1	1	4	A	9	F	9	0
KS	FD	3	1	1	4	A	10	F	10	0
KS	FD	3	1	1	4	A	1	M	4	0
KS	FD	3	1	1	4	A	2	M	4	0

The Reg ex I'm using looks like this:

Code: [Select]

^KS\tFD\t3\t{+:d|11|12}\t{+:d}\t{+:d}\t{A|C}\t{+:d|10}\t{F|M}\t{+:d|1+:d}\t{+:d}$Note that, in the 3rd line, I have to use {+:d|10} to get it to find the "10", otherwise it skips over the line with the 10 in it. Simlarly, I have other fields that have multi-digit values that are skipped unless I "or" them in. The expression brackets exist because I need to turn this into a comma-separated list.

I expect the expression "+:d" to mean "0 or more" (+) and digit (:d) to get me anything from "0" to "000111222333". Am I missing something?

I've got the work done for now, but would like to figure this out so that the next time I need to do this kind of thing I won't pull what little hair I have left out getting it to catch the special cases.

I'm using the Slickedit regular expression in SE 2008 (13.0.0.0). I'm using the Visual C++ 6 emulation. Any suggestion would be appreciated.

Thanks,
Jim

EDIT - Hold on. I may have found what I'm looking for in help - the Regex Evaluator will likely help. I'll repost if that doesn't fix em up. Thanks.

Lee · « **Reply #1 on:** April 17, 2008, 01:37:26 PM »

The quantifier token needs to come after the token you want to match, you want ":d+" which translates to "minimally match character set [0-9] one or more times". Putting the token first means nothing in that situation, and will only match a single character in set [0-9]. If you want zero or more occurrences, the you would use :d* (at least in SlickEdit syntax).

You should check out the help section on Regular Expressions, the documentation is gone through a great deal of improvement in recent versions and may help out.

jimlangrunner · « **Reply #2 on:** April 17, 2008, 01:37:47 PM »

No, I don't feel like too much of an idiot, but a little.

Code: [Select]

{+:d} will pick up 1, but not 10

Code: [Select]

{:d+} will pick up 1 as well as 10

The key is that the "0 or more" (*) and "1 or more" (+) modifiers are post-fix, not prefix.

So I learned something today, and I thank you for taking the time to read this.

Jim.

jimlangrunner · « **Reply #3 on:** April 17, 2008, 01:39:13 PM »

Quote from: Lee on April 17, 2008, 01:37:26 PM

...
You should check out the help section on Regular Expressions, the documentation is gone through a great deal of improvement in recent versions and may help out.

Lee,

You're right. I should read more before posting.

Thanks,
Jim.

Author Topic: Regular Expression question (Read 3468 times)

jimlangrunner

Regular Expression question

Lee

Re: Regular Expression question

jimlangrunner

Re: Regular Expression question

jimlangrunner

Re: Regular Expression question