I want to match, basically, “any printable character except <whatever>.” Whatever changes depending on exactly what I’m matching (for instance, any printable except ‘<’). Is there any way to specify [:print:], except ‘<’? I tried [[:print:]^<], but it didn’t seem to work correctly. Is there any way to do this, or do I just have to specify the range of printables except that character manually?
I’m not sure about POSIX, but in many flavors [^xyz] means any character other than x, y, or z.
Edited: Never mind, I didn’t understand your original question because of the smileys.
I don’t know how to do this in POSIX. With .Net regular expressions, you can use character class subtraction, so for example [\w-[ab]] means any alphanumeric except “a” or “b”. I don’t see this syntax in a cheat sheet for POSIX. You should be able to do it using a lookahead assertion.
Yeah, sorry forgot [noparse]:p[/noparse] -> :p, basically I’m asking (if we’re using PERL) if there’s a way to say (using alnum this time):
\w except “m.” Just a way to take an already constructed character class and REMOVE an element (or handful of elements) from it, rather than building it from the ground up. The only real difference I’ve noticed between PERL and POSIX regexps is way you specify the default character classes, the syntax is almost the same.
Hmm, maybe I will have to use lookahead. A pain, but I guess I have to work with it… damn flex/bison.
You could just do [A-LN-Za-ln-z0-9_]
Oh, I know, I was just hoping for something prettier and more self explanatory looking.
Using a negative lookahead, you could try:
(?!m)\w
It looks like within the flex/bison package itself there’s support for a subtraction syntax unlike standard POSIX style regexps, so {a-c}-{b-z} comes out as “a”. Oddly, instead of using square brackets (like you do for, oh, every other character class), you use curly brackets… for some reason.
I discovered this while poring over the manual for something completely unrelated, in a completely unrelated section.
or, you could just declare one with an exhaustive list of every printable character except <whatever>. Ugly, sure. But if it has to be done by lunch, then it might be worth it.
There are only finitely many characters.
Can you do it in two steps? First allow every character in the class, then run the result through another regexp that disallows the characters you don’t want. My perl is rusty, but in a shell script I might do something like this:
grep \w | grep -v m