Regular Expression question

Does anyone know of a Regex that will match all spaces that are not within matching quotes? I’m having trouble with this one.

TIA!

–FCOD

Regular expression syntax is often language specific. So you’d get a better answer if you said what language you are using.

Never mind. I found an alternate solution.

–FCOD

I haven’t thought about this long and hard but I think the “matching” part is what will confound you. I am assuming you need to match on both single quotes and double quotes. I do not know of any construct in regex’s that is dynamic, that is, that part of the expression evaluates based on what another part to the left of it matched on.

Are you trying to just grep for this pattern, or are you trying to edit it to make the spaces not in quotes to be something else? There is a convoluted approach you could take for editing them out; I’m not sure if there’s an elegant one.

In Perl, you can do such constructions with backreferences. E.g.



/(['"])(.*?)\1/


Would match a ’ or a ", followed by anything, followed by whatever was matched by the first part. If you wanted to filter out all whitespace that was not inside such a construct, well, that’s a real pain in the ass to do with a regex (even in Perl).

For all your RegEx needs.

This will find all spaces not withing single quotes


(?=(?:[^\']*\'[^\']*\')*(?![^\']*\'))

And this will work for double quotes.


(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\")) 

It’s actually pretty simple to design the non-deterministic finite state machine that processes this particular language.


state 0        space           state 1
state 0        "               state 2
state 0        '               state 3
state 0        other           state 0
state 1        [symbol]e[/symbol]               state 0
state 2        "               state 0
state 2        other           state 2
state 3        '               state 0
state 3        other           state 3

State 0 is the initial state, state 1 is the accepting state (where accepting in this case means performing some action). There are algorithms that can translate this into a regex (depending, of course, on what you want that action to be), but honestly, it’s just as easy to write this one directly.

I haven’t seen backreferences in the search expression before, I have just seen that used in the replacement string. Is that standard regex or specific to perl?

Backreferences are a standard part of basic regexes. Note, however, that they tend to compromise performance considerably, so it is probably good that OP found an alternate solution.