Perl String Matching Problem

I am trying to write a Perl script, a part of which is to match strings from one file with another file, and print out further information from the appropriate line of the file if there is a match. Mostly this is working fine, other than in cases where the text that needs to be matched contains parentheses or brackets.

In these cases, the a match does not occur even if the strings are exactly the same. I have confirmed that this is the case for the troublesome examples. I have a good idea that it is something to do with parentheses being ‘metacharacters’, but attempting to escape the brackets by putting backslashes before them in both files (at an earlier stage in the script) produced the following error in the matching routine:

Unmatched ) in regex; marked by <-- HERE in m/^) <-- HERE / at datafile_parser.pl line 139, <DATA_TOMATCH_IN> line 219

What am I doing wrong?

My guess from the error message is that you interpolate the text to be matched into the regex, right?

In that case you need to escape your special characters twice, e.g. if the text to be matched is (wordinbrackets), the text in the first file must be \(wordinbrackets\), which after the variable interpolation becomes (wordinbrackets).

Unless I’ve misread your description, you attempted to fix the problem by putting the backslash in the input file. The place for the backslash is in the regular expression.

Based on this snippit from the error message, change m/^) in the regex in your script to m/^) which reads “Match on a close paren at the beginning of a line.”

Can you post the full regular expression in question, along with samples of the text it is supposed to match?

Also, what version of Perl are you running?

No, that doesn’t do it. Three, two, or one backslash before each bracket it doesn’t make any difference to the error message that appears.

The text is put into a temporary file which I then read back in to an array to carry out the matching in the second phase of the script. There are two parallel files at this point, then I use a nested foreach loop to carry out the matching. It works fine for everything but the strings with brackets in them.

Or, in actual code,



if ($secondstring =~ m/^\(wordinbrackets\)/) {
   ...
}


is equivalent to



$firststring = '\\\(wordinbrackets\\\)';
if ($secondstring =~ m/^$firststring/) {
   ...
}


The reason being that the backslash is also a special character that needs to be escaped.

That above post was a response to tschild.

I wrote the backslashes into the temporary file. The closed parenth was not at the beginning of the line in the problematic string. The strings are at the beginning of the line and it’s supposed to match the whole thing until the tab is reached–including the brackets.

This is the string that is causing the problem (I’ve edited the actual text but the structure’s exactly the same:
string to be matched (text in brackets) 3 Number 41

if ($full_in_line =~ m/^$the_example_name/)
{
print DEBUG_OUT $the_example_name;
print DEBUG_OUT "
";
$match_found_in_iteration = 1;
}

where $the_example_name is the string in the other temp file that is a list of strings, one per line, for which I need to pull out other data from the big file (the one which $full_in_line is an array element from).

Perhaps you could print out the content of $the_example_name, in a line before the regex, to check if it does in fact contain the string that you expect it to contain?

A somewhat inelegant way to get around the ‘special characters in regexes’ problem would be to use a string comparison:



if (substr($full_in_line, 0, length($the_example_name)) eq $the_example_name)
...


To prevent interpolation of metacharacters in a string, use the \Q (quote) assertion:

m/\Q$some_string\E/

Thanks friedo, that seems to have done the trick (without any escape characters inserted into the temp files). The troublesome entry is now in the report file with the others, and I didn’t have to fudge it by substituting the brackets out.