So I’m working today, and come across a bug (?) in an old perl script I’ve been asked to maintain. The script reads data from an input file, parses data elements out of the line string, and does stuff with the parsed data. It contains the code:
foreach $line (<INPUT_FILE>) {
chomp($line);
print "$line
";
**$line=~s///;**
print "$line
";
$line=~/^(.*)_prj.*\@(.*?)\ (.*)_/;
print "$line
";
$a=$1;
$b=$2;
$c=$3;
# Do stuff with $a $b and $c
}
I’m a little confused by the bolded line (the print statements are from my debugging). It doesn’t seem to do anything the first time through the loop, but on a second iteration it strips off everything after the first “_” character in the line (which is before the one matched by “_prj”). That obviously leads to issues parsing $a, $b, and $c. This bug seems to have stayed hidden for a while, since the input file this script reads only has a single line 99% of the time.
My best guess is this is a bug where the previous script writer intended to strip out a Carriage Return byte, and that byte got lost in the match section during a windows-to-unix transfer. Commenting out that line makes the script work the way it’s supposed to.
But what is “=~s///” supposed to do? At first glance, it seems to be “match nothing, replace that with nothing”, a null operation. But it does something, and it appears to be keyed off of the pattern match of the previous iteration. If I comment out that match, then the s/// doesn’t do anything the next time through the loop.
Right now, I’m going under the assumption that the “empty” substitution isn’t needed for my script. But I’m trying to understand what that is supposed to do (if anything), or if I’ve wandered off into an “undefined behavior” area of Perl. It can be hard to tell whether weird behavior in a Perl script is by design or not.
If it matters, this script is running with perl 5.6.1 on Sun Solaris 5.8.
My guess would be that the original coder was called away just as he was to put something in there, then forgot what he was doing and it stayed. As to it only doing something once and only in a particular instance well…welcome to the wonderful world of interpretted languages. “Our bugs are your bugs!”
I just implemented this to see what would happen. With version 5.8.5, the s/// cuts off everything before the last _. However, the regex match still works: $a, $b, and $c contain what they’re supposed to. I have no idea how that’s possible.
Well, thanks for the confirmation on the WTF factor of this.
I think the code I originally posted was too simplified to quite display the behavior I was seeing in the full script. The code below does recreate the weirdness:
foreach $line (<DATA>)
{
chomp($line);
print "line1: \"$line\"
";
$line=~s///;
print "line2: \"$line\"
";
$line=~/^(.*)_prj.*\@(.*?)\ (.*)_/;
print "line3: \"$line\"
";
$cc_prj=$1;
$pvob=$2;
$activity=$3;
# Check for Windows style vob mount (\vobName), replace
# with Unix mount point (/cc/vobs/vobName)
$pvob=~s/^\\/\/cc\/vobs\//;
$activity=~s/\\/\/cc\/vobs\//;
# derive the user from the deliver.username_date in the activity
$user=$activity;
$user=~s/deliver\.//;
$user=~s/_.*$//;
print "cc_prj: \"$cc_prj\"
";
print "pvob: \"$pvob\"
";
print "activity: \"$activity\"
";
print "user: \"$user\"
";
print "
";
}
__DATA__
firstProject_1.0_prj_integration@\vob1_pvob deliver.user1_firstProject_1.0_prj.20060420.135133@\vob1_pvob_
secondProject_2.0_prj_integration@/cc/vobs/vob2_pvob deliver.user2_secondProject_2.0_prj.20060420.135137@/cc/vobs/vob2_pvob_
{if you think that data syntax looks familiar, yes I am dealing with Clearcase delivery reports}
Given that change required to trigger the weirdness, I suspect that the ~=s/// call re-uses the last regex string that was executed. In my case, that’s the =~s/_.*$// done on the $user variable in the previous loop iteration. The first time through the loop, there was no prior regex, so the ~s/// does nothing. Who knows whether that’s an intended shortcut (to apply the same regex match across multiple variables without retyping, although that seems overly obfuscated) or just an internal Perl variable not getting unset.
I agree that this was probably introduced by a “write some incomplete code, went to look something up, got distracted and forgot where I was” situation. The fixed script is up and running, now I’m just trying to figure out this weird Perl behavior.
Huh, that’s more in-depth than the perl references I was checking. That’ll teach me the error of not going straight to the source. It certainly makes sense now given what I was seeing.
You’re an evil, evil person.
Thanks, all, for clearing up (and sharing) one bit of confusion in my day.
I picked up 3rd Edition camel, and was getting ready to crack it open, but the explanations here sound accurate and remind me of a bug I hunted down with the same origin. If you want to leave it in as a shortcut, you should put in a comment that explains that you’re modifying the variable $_, Perl’s equivalent of the pronoun “that”.
You can also include a reference to this page which explains the behavior of $_ - some of the nastiest bugs occur when it carries a value into a place you didn’t expect.
After a little bit of testing, it appears that scoping is significant. Out-of-scope regular expression matches are behaving like there is some regular expression stack.