Regular Expression Help

Enright3 · October 4, 2006, 6:43pm

OK, I’m a but rusty on my regular expressions (like 10 years rusty).

I have a comma delimited text file, and I want to find any value immediately following the 7th comma that does NOT equal “M”, “F”, or “U”.

for example: I want to find the highlighted value in the following sample string:
“11111111”,"",“Level”,“Practice”,"","","",“X”,“01/01/1900”,""," "

What would the regular expression be to find that instance?

E3

rackman · October 4, 2006, 7:53pm

Something like this:

.,.,.,.,.,"[^MFU]",.

should work, I think. I tried it and it worked in my tests.

rackman · October 4, 2006, 8:00pm

Sorry that should have been

.*,.*,.*,.*,.*,.*,.*,"[^MFU]",.*

I left out a few commas

black_rabbit · October 4, 2006, 8:01pm

I’d do it with grep and awk, but I’m not a programmer:

echo $LINE | awk -F"," '{print $7} | grep [A-EG-LN-TV-Z]

You could probably do that a helluva lot more efficiently with a read regex, but I’m lazy.

Derleth · October 4, 2006, 8:03pm

If your regex engine supports it, this is more readable:


([^,]+,){7}"[^MFU]",.*

The Perl regex engine and many others support that notation now.

You might also make the first set of parens noncapturing by adding ?: to the very front of the expression inside them. Perl supports that, but I don’t know about any others.

CookingWithGas · October 4, 2006, 8:14pm

RegExp uses greedy matching so this will not work on a line that has more than 8 commas. I don’t think the OP said how many commas were on a line. You need to anchor the beginning with

^

and instead of

.*

use

[^,]*

Note that Derleth uses this in the more economical version.

Omphaloskeptic · October 4, 2006, 8:26pm

Using .* to match a single field is bad; that will match one or more fields. Using [^,]*, will match exactly one field.

The replies so far will only find single quoted characters. (So they will find a seventh field “X” but not “XX” or “”.) To find multiple-character strings too, you have to be more careful. If you know that the fields are always double-quoted, you can use something like
“(|[^,]{2,}|[^MFU,])”
to match a quoted string with either zero or two characters between the quotes or with a single character which is not M, F, or U.

This won’t find fields like M or X, though. If you want to find unquoted and malformed fields like M and “M"M as well, add extra conditions to the whole thing:
“(|[^,]{2,}|[^MFU,])”|[^,”][^,]}|[^,][^,"]
(this checks for fields which either start or end with a nonquote). Now you must anchor the end to make sure you get the whole field; you end up with something like


/^([^,]*,){7}("(|[^,]{2,}|[^MFU,])"|[^,"][^,]*}|[^,]*[^,"])(,.*)?$/

(the desired field is now in $2).

Really, finding exceptional cases like this is probably better done with actual logic instead of coding some write-only regexp though. Use a regexp to check for proper quoting; then split the line into fields and check each one in code.

LSLGuy · October 5, 2006, 2:39am

I’m not a real regex guru, but it sure looks to me like everybody’s examples to date will fail if any of the earlier field entries are of the format “abc,def”.

If looks to me like they’d miscount a single field “abc,def” as if it was two fields.

If the source file grammar has the " wrappers for field values optional it gets even messier. Omphaloskeptic’s fine entry covers for this case on the 7th field. But I think you’d want similar logic on fields 1-6.

Omphaloskeptic · October 5, 2006, 3:29am

Good point; I’d been thinking of the commas as outer delimiters, but if quotes can quote commas then [^,]* won’t work. You’d need something like
(("[^"]"|[^",]|),){7}
and there would be some changes to the pattern for the field of interest as well.

Topic		Replies	Views
String processing in Python Factual Questions	5	777	February 15, 2008
Java Regular Expression Help Factual Questions	5	883	October 7, 2005
Need help forming a regexp ("not" operator?) Factual Questions	7	2509	April 27, 2011
Regular expression counter Factual Questions	5	4086	May 28, 2012
Dictionary with regular expressions searches? Factual Questions	5	9482	April 1, 2011

Regular Expression Help

Related topics