Reply
 
Thread Tools Display Modes
  #1  
Old 04-08-2009, 07:38 PM
chorpler is offline
Guest
 
Join Date: Apr 2002
Location: South(est) Texas & Vegas
Posts: 3,678

Regular expression: matching "everything but"


I'm editing a big HTML file, with a bunch of CSS classes that. I want to convert everything that says:

<p class=StyleXXX>

where XXX is a number from 1-652, into

<p class=Normal>

Except for classes 115, 121, 122, 130, 134, 137, 140, 146, 147, 148, and 149, which I want to convert to

<p class=Blockquote>

I figured out how to match up those classes with the following regex-using perl program:

Code:
open(INPUT,"<$ARGV[0]") or die;
@input_array=<INPUT>;
close(INPUT);
$input_scalar=join("",@input_array);

$input_scalar =~ s/(<p class=Style(115|121|122|130|134|137|140|146|147|148|149)>)(.*?)(<\/p>)/<p class=Blockquote>\3<\/p>/g;

print($input_scalar);
This takes an input file and prints it to standard out; I usually redirect it to a file after I determine that it works.

That handles all those "exceptional" classes. But how can I invert that to come up with my other search and replace? There are way too many numbers to put them all in like I did with that one...

Thanks for any help.
  #2  
Old 04-08-2009, 07:45 PM
friedo's Avatar
friedo is online now
Guest
 
Join Date: May 2000
Location: Brooklyn
Posts: 24,283
Perl supports a zero-width negative look-ahead assertion (which also happens to be my favorite instance of regex jargon.)

So you can do this:

Code:
s/(<p class=Style(?!(115|121|122|130|134|137|140|146|147|148|149))\d{3}>/<p class=Normal/g;
What this says is:

1. Match <p class=Style
2. not followed by the list of numbers
3. Followed by three digits

I think that should do it.

BTW, you are using strict and warnings, right?
  #3  
Old 04-08-2009, 07:47 PM
ultrafilter is offline
Guest
 
Join Date: May 2001
Location: In another castle
Posts: 18,988
It's not elegant, but you can use one regex to make sure that you have a tag of the form <p class=Style(\d\d\d)>, and then use another regex to decide whether that three-digit sequence is in that list. If so, replace it with Blockquote; if not, replace it with Normal.
  #4  
Old 04-08-2009, 07:59 PM
chorpler is offline
Guest
 
Join Date: Apr 2002
Location: South(est) Texas & Vegas
Posts: 3,678
Thanks, friedo! I had found the zero-width negative look-ahead operator before, but I left off the trailing \d{1,3} (just \d{3} won't do it because, as I forgot to mention, the styles go from Style1 to Style652, rather than Style001) and it was only matching the Style text and was leaving the numbers unchanged, so I thought I was misunderstanding "zero-width negative look-ahead." But of course, I just had it wrong. Thanks!

ultrafilter, I thought about doing something like that, but it's been so long since i used perl that I've completely forgotten it, so I couldn't tell how to put the output of one regex into the next and still output the whole file the way my current search-and-replace statement does.
  #5  
Old 04-09-2009, 12:29 AM
Omphaloskeptic is offline
Guest
 
Join Date: Oct 2001
Posts: 1,263
Alternately, if you do all of the exceptional replacements first, then you should just be able to follow them with a catchall
Code:
$input_scalar =~ s/(<p class=Style\d{1,3}>)(.*?)(<\/p>)/<p class=Normal>$2<\/p>/g;
since there won't be any of the exceptional values left to match.

Also, the (.*?) in your regexp won't match multiline constructs (e.g., where the <p> and </p> tags are on separate lines) unless you use the /s modifier. You may want to use /i as well (or [Pp]), if the tags might be <P>...</P>.

Something I find useful for these quick search-and-replace operations is perl -pe <perl-command> <file>. This sets $_ equal to each line of <file> and runs <perl-command> on it, then prints the result, so you can try different things quickly. When you get it to work, use perl -p -i.bak -e, which then saves the original file as <file>.bak and puts the modifications in <file>.
  #6  
Old 04-09-2009, 01:10 AM
Reply is offline
Member
 
Join Date: Jul 2003
Posts: 8,503
Wow. Regexes rule. The Dope rules more!

:worship:
  #7  
Old 04-09-2009, 01:51 AM
chorpler is offline
Guest
 
Join Date: Apr 2002
Location: South(est) Texas & Vegas
Posts: 3,678
Quote:
Originally Posted by Omphaloskeptic View Post
Alternately, if you do all of the exceptional replacements first, then you should just be able to follow them with a catchall
Code:
$input_scalar =~ s/(<p class=Style\d{1,3}>)(.*?)(<\/p>)/<p class=Normal>$2<\/p>/g;
since there won't be any of the exceptional values left to match.

Also, the (.*?) in your regexp won't match multiline constructs (e.g., where the <p> and </p> tags are on separate lines) unless you use the /s modifier. You may want to use /i as well (or [Pp]), if the tags might be <P>...</P>.

Something I find useful for these quick search-and-replace operations is perl -pe <perl-command> <file>. This sets $_ equal to each line of <file> and runs <perl-command> on it, then prints the result, so you can try different things quickly. When you get it to work, use perl -p -i.bak -e, which then saves the original file as <file>.bak and puts the modifications in <file>.
Thanks Omphaloskeptic. I was doing perl -pe at first, but I couldn't get it to work with multiple lines (as you mention), so I switched to a full perl program. Then I realized I didn't really WANT my file to be multiline, so I consolidated it so that every statement in the HTML file is on a single line, separated by two CR-LF's, and just turned on line wrapping in my editor. That way I didn't have to ever worry about the multiline problem. I appreciate the extra information for the future, though -- undoubtedly it will come in handy at some point, probably soon.
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 12:24 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2019, vBulletin Solutions, Inc.

Send questions for Cecil Adams to: cecil@straightdope.com

Send comments about this website to: webmaster@straightdope.com

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Copyright 2018 STM Reader, LLC.

 
Copyright © 2017