I, unfortunately, never really learned regexps, and now I’m sort of in dire need of them. I could do this without them, but it would be a general PITA. I basically need to make a delimeter to match any characters NOT in a given set. \W gets close, but for the things I don’t want to delimit I need to include a few special characters like apostrophes and hyphens, so something like --all the expressions NOT in “\w’-”–(that’s not all the characters necessary, but should give you an idea). You’d think it’d be easy to Google this, but it’s not. Maybe that’s because I don’t know how to use regexps…
[^abcd] means “match any character except a, b, c or d”.
So something like [^\w’-"–] should do what you want.
If a character class delimited with square braces beings with a caret, it means match something not in the class:
[^,;:]
Matches a single character which is not a comma semicolon of colon, for instance. - may also be used to indicate ranges:
[^A-G1-5]
Matches a single character not an uppercase A through G or 1 through 5.
Ah, I got confused because “^” seems to also be an anchor meaning “begins with.” What’s the difference between these usages? I assume it’s something like
^[A-G] means “starts with A-G”
while
[^A-G] mans “not A-G”
Is that right?
Yep.
So, I’m trying to make a delimiter that will separate paragraphs based on whether they have one or more blank lines separating them (in other words, 2 or more newline characters). I figured:
{2,}
Would work, when that failed, I tried
[
]{2,}
Which also didn’t work. Am I missing something? Is the scanner just being buggy?
Never mind, forgot Windows uses carriage returns. Stripping the file of carriage returns with a simple program (I only have to handle Unix-formatted files) worked perfectly.
Are you sure the input has just newlines (
) and not crlfs (
)?
ETA: never mind then.