This takes an input file and prints it to standard out; I usually redirect it to a file after I determine that it works.
That handles all those “exceptional” classes. But how can I invert that to come up with my other search and replace? There are way too many numbers to put them all in like I did with that one…
It’s not elegant, but you can use one regex to make sure that you have a tag of the form <p class=Style(\d\d\d)>, and then use another regex to decide whether that three-digit sequence is in that list. If so, replace it with Blockquote; if not, replace it with Normal.
Thanks, friedo! I had found the zero-width negative look-ahead operator before, but I left off the trailing \d{1,3} (just \d{3} won’t do it because, as I forgot to mention, the styles go from Style1 to Style652, rather than Style001) and it was only matching the Style text and was leaving the numbers unchanged, so I thought I was misunderstanding “zero-width negative look-ahead.” But of course, I just had it wrong. Thanks!
ultrafilter, I thought about doing something like that, but it’s been so long since i used perl that I’ve completely forgotten it, so I couldn’t tell how to put the output of one regex into the next and still output the whole file the way my current search-and-replace statement does.
since there won’t be any of the exceptional values left to match.
Also, the (.*?) in your regexp won’t match multiline constructs (e.g., where the <p> and </p> tags are on separate lines) unless you use the /s modifier. You may want to use /i as well (or [Pp]), if the tags might be <P>…</P>.
Something I find useful for these quick search-and-replace operations is perl -pe <perl-command> <file>. This sets $_ equal to each line of <file> and runs <perl-command> on it, then prints the result, so you can try different things quickly. When you get it to work, use perl -p -i.bak -e, which then saves the original file as <file>.bak and puts the modifications in <file>.
Thanks Omphaloskeptic. I was doing perl -pe at first, but I couldn’t get it to work with multiple lines (as you mention), so I switched to a full perl program. Then I realized I didn’t really WANT my file to be multiline, so I consolidated it so that every statement in the HTML file is on a single line, separated by two CR-LF’s, and just turned on line wrapping in my editor. That way I didn’t have to ever worry about the multiline problem. I appreciate the extra information for the future, though – undoubtedly it will come in handy at some point, probably soon.